A method for quickly importing and working with CSV files (part 1)

So many folks have asked me to write more on my coding style. So here it is: I’ve developed a two-parter blog post to present my thinking style and how I ultimately solved the problem.

The Problem

I have a client with several CSV files that must be checked and validated for certain data specifications. Each individual CSV file is essentially an output of a table from a query of a larger system. My client required I verify the contents of the CSV files and ensure the connections between the file are correct.

By connection, I mean that several of these CSV files have foreign and primary keys between them. The CSV files could also contain many errors. My client needs an Excel application that can verify these files link together with good data. With that information they can upload the files into another system.

Below is my thinking on how to solve the problem. In the next blog post, I’ll go through implementing the solution.

PowerPivot Won’t Cut It

Now, PowerPivot is actually a great candidate for this problem. But I couldn’t be sure my client (and the folks using my Excel application) would have the ability to install and use PowerPivot. So while I’d love to use PowerPivot for this in an ideal world, it just wasn’t a feasible solution for my client.

ADO is Out

So another would be to use ADO (here’s an example to see what I mean). The problem with using ADO is that it relies on references outside the normal VBA library. In previous years, that didn’t use to be a problem. But as I try to run many of my older Excle applications that had interfaced with Access and other Excel files using ADO, it’s just not a solution that ages well. The reason is that the libraries keep getting updated or names change. New versions of Excel require new connection strings. Over time, these types of solutions just break.

Iteration: My Own Worst Enemy

The most obvious solution—indeed, the one I really, really wanted to avoid—was to open these files all at the same time and then iterate through each record. This solution however is actually the bane of my existence.

For one, iterative algorithms like these explode in complexity as more records are added to the CSV file. And they take forever to complete.

Second, iterative algorithms for this type of problem really promotes something I don’t like. Specifically, it requires you create nested loops upon nested loops of iterations and compounding conditions.

This type of iterative solution is often the go-to for many organizations. It often results in hundreds of lines of VBA in a module with large sections of comments explaining what’s going on. But those comments never help. Indeed, when a company expresses concern about hiring an Excel developer because they fear what might happen when that person leaves, it’s because they’ve experienced one of these long-form iterative solutions. Let’s be honest: they take forever to complete; they’re a pain in the ass to debug; and when passed along to the next developer, they’re rarely touched because nobody understands them.

Copying/Pasting: Just Say No

So I decided early on I had no interest in opening these CSV files several at a time and creating a gigantic loop that only I could (barely) understand. I reasoned that I could do much of the required validation rather easily (and quickly) with formulas. But to use formulas on the data, I would have to load these CSV files into the current workbook somehow. My goal then was to have corresponding tabs filled in with the contents of each given CSV file.

Because each CSV files would always have the same name, I wouldn’t have to create new tabs on the fly. So that was nice: I could first create a tab for each corresponding CSV file (in this case, the tab’s name was the same as the CSV file).

So another obvious solution would be to open each file programmatically, and then copy its entire contents over using the Copy and Paste commands in VBA. This solution is common, but also fraught with problems. For one, it’s an expensive solution when the datasets are large. A large CSV file would require a lot of data copied to the clipboard. Several of these pasting operations happening one after another is a great way to crash a spreadsheet. Second, sometimes the clipboard just simply acts unpredictably; clearing out for now apparent reason.

So what did I do? Find out in the the next thrilling blog post… but feel free to share your ideas!

Advertisements

2 thoughts on “A method for quickly importing and working with CSV files (part 1)

    1. Jordan G Post author

      I couldn’t use Power anything… 😦 although it would have been the ultimate fit to the problem!

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s