Category Archives: No VBA

Excel Requests – HTTP for Humans

Written by Bjoern Stiel

APIs are the lifeblood of the internet, providing access to bits and bytes that make our personal and professional worlds work.

For example, Quandl provide a vast amount of financial and economic data through their API, while GitHub and StackOverflow let you pull out all sort of user and repo data through their API.

But how do we communicate with these APIs… from Excel?

Working with HTTP with all its different authentication methods and custom headers can be painful. Plus, Excel doesn’t play nicely with Json. Sure, VBA is always an option (VBA-Web seems a very decent project) and Power Query has been a huge leap forward, but neither of them are ideal if you need something quick or programmatic.

As a seasoned Pythonista, I am a big fan of Kenneth Reitz’ Requests package as it massively simplifies HTTP communication in the Python world. This inspired me to write Excel Requests, a dead simple Excel Addin to make Json over HTTP from Excel a walk in the park.

Installation

Installing Excel Requests is fairly simple, download the PowerShell installation script from the docs and run it (right click, Run with PowerShell).

Making your first Request

Start a new Excel (in case you haven’t already done so after the installation) and let’s make a GET request to get the Facebook share price history from Quandl.

 

>>> =REQUESTS.GET("https://www.quandl.com/api/v3/datasets/WIKI/FB.json")
    "https://www.quandl.com/api/v3/datasets/WIKI/FB.json"

Now, this function simply returns the URL which might be a bit surprising. This URL represents the Quandl server response. It is a (nested) Json dictionary that we cannot represent easily in our two dimensional grid system. So instead we return a “handle” to the cached dictionary, which we can access through two other Excel functions.

Let’s see what data the response dictionary contains:

>>> =REQUESTS.DICT.KEYS("https://www.quandl.com/api/v3/datasets/WIKI/FB.json")
    {"Text";"StatusCode";"StatusDescription";"ContentType";"Method";"Json"}

REQUEST.DICT.KEYS gives us the list of dictionary fields. In this case it is the response dictionary with meta information on StatusCode, StatusDescription, ContentType and Method and the actual response in the “Text” field. And, in the Json case, the actual response parsed as… you guessed it, another dictionary.

If we go back to https://www.quandl.com/api/v3/datasets/WIKI/FB.json we can see, that the timeseries data we are after is in dataset/data, so let’s pull it out (you can see from the Json representation that it is an array, so bear in mind to do it as an Excel array formula):

>>> =REQUESTS.DICT.GET("https://www.quandl.com/api/v3/datasets/WIKI/FB.json", "Json/dataset/data")
    {"2016-09-12",125.96;"2016-09-09",129.71; ...}

Authentication

You might often need to pass authentication. If you happen to have a GitHub account, you need to authenticate to pull information about yourself. A quick look at the GitHub API docs reveals that we can use basic authentication whereby the authentication string is simply <username>:<token>:

>>> =REQUESTS.GET("https://api.github.com/user",,,"<username>:<token>")
    https://api.github.com/user

 

 

Custom Headers

If you’d like to add HTTP headers to a request, simply pass in a two-column range to the headers parameter (column one refers to the header name and column two to the header value).

For example, Appveyor uses a (very slightly) different authentication method, usually referred to as Bearer Token Authentication

>>> =REQUESTS.GET("https://ci.appveyor.com/api/projects",,{"Authorization","Bearer <token>"})
    https://ci.appveyor.com/api/projects
 

 

Excel Requests is an open source Addin for Excel written in C#
To get started, check out the docs – http://excel-requests.pathio.com/en/latest/
If you need help, contact me bjoern.stiel@pathio.com

 

Written by Bjoern Stiel
Minimalist, Ex-Investment Banker, Founder of Pathio – Version control for Excel
Email me bjoern.stiel@pathio.com, follow me @bjoernstiel

 

 

Auto highlighting Excel charts

On your dashboards, reports, and charts it’s a good idea to alert the user to certain types of values. In a recent post, we discussed how to alert users to certain records on tables using custom formats. We’ll build upon that here but transition to how we might achieve a similar effect on a chart.

Take a look at the chart in the image below. I’ve built the chart so that no matter what values are being plotted, the greatest value is always highlighted in red. Following similar themes from previous blog posts, I achieved this effect with no VBA. The advantage to not using VBA is that it’s speedy and requires no iteration. On the other hand, for this effect to work, you’ll still need to test if every point is the maximum by using an Excel formula. For small series such as the one shown here, this isn’t really a big deal. But for series with many more points–and more more complex projects–it’s an additional column of data you’ll need to take care of.

image

Let’s do it!

Luckily, the implementation is super easy. So let’s get started. For this example, I needed to create some fake data. So for the values B2:B5, I used RANDBETWEEN. In cell B8, I then find the maximum value that has resulted. (Of course your data won’t be random, but to demonstrate this chart will always pick the greatest value, I use RANDBETWEEN.) The formulas for these items are shown in the following image.

image

Once we have the maximum value, we need to create a parallel series where that maximum value is the only value that appears. Everything else should be #N/A (or something similar–more on that in the discussion section). We could use zero here, but zero is actually a value on our chart. So it’s better we use something that represents a non-number. NA() is perfect for this. The image below shows an IF formula used in the adjacent column to test whether a specific region holds the maximum value.

image

Now, we should have two columns. The first column contains our original data; the second column contains only the maximum number from our original and #N/A’s. At this point, you’re ready to insert a chart. The easiest thing to do is to highlight our data range and insert a new column chart. I’ve done just that below.

image

Remember, the  values on my screen are being generated randomly, so the orange column may be at another range (or even another color) on your screen. At this point, what we’ll want to do is plot the “Max Series” –  the orange column – on the secondary axis. You can do this easily by right-clicking the orange column and going to Format Data Series…..

image

Within the Format Data Series dialog box (Excel 2010 and previous) or context pane (Excel 2013+) go to Series Options and select Secondary Axis. The image below will result.

image

All you have to do now is some clean up and then you’re finished. Below, you can see I’ve removed an axis and adjusted some formatting.

image

Discussion

In this simple example, we just used the maximum value, but you could highlight all sorts of values depending upon what you’re looking for. Perhaps you would like to highlight one or more columns whose values are in a specified range; or, perhaps you just want to direct your viewer to the smallest value. Really, the rules are endless but the dynamic is the same. You just need an additional column that parallels the first.

You may have noticed we introduced #N/A errors where a value did not match the maximum value. Generally, having unhandled errors on the spreadsheet can slow things down. (I’m not sure if that extends to generated errors like these, perhaps a reader could verify for me? I would think they make a contribution.) One way around this is to use a number like negative one instead. If you go that route, just make sure you’re only plotting positive data. You could also use a blank text string, “”. I’ll leave that choice up to you, but remember all options come with tradeoffs. Try each out and see which works the best for you.

Download file: Max Chart Highlight

“Everyday” Random Shuffling with Formulas

In this article, we’ll talk about how to complete random shuffling with only formulas. Shuffling methods via VBA or a dime a dozen. But it’s much harder to perform a formula-based random shuffle since formulas don’t really allow for iteration. Non-VBA methods therefore require we think about these problems differently (what I describe in Dashboards as “Thinking Outside the Cell”). I call this the “everyday” method because it’s technically imperfect. From  a statistical standpoint, the results are not uniformly distributed. That means the method isn’t perfectly random at all times. We’ll attend to that later in the article.

Step 1. Lay it all out

For now, let’s take a look at a spreadsheet showing our deck of cards (the Excel file is available for download at the end of this article).

image

Though it’s slightly cut off in the picture, there is a complete 52-card deck.

Step 2. Add Some Randomness

Because this is a shuffle, we need to add some randomness to reorganize the cards in the deck. Perhaps the easiest method to do this would be to use RANDBETWEEN. Here, we’ll use =RANDBETWEEN(1,52) since there are 52 cards in total. In the next image, I’ve dragged this formula down so that it’s used 52 times.

image

Anyone who has attempted shuffling in Excel before should immediately realize the problem with using RANDBETWEEN as is. First, there are no guarantees RANDBETWEEN will produce every single number between 1 and 52. Second, there are no guarantees RANDBEETWEEN won’t produce a series of repeated numbers. In the image above, the cells highlighted in peach are duplicates! Clearly, our method needs a few more steps!

Step 3. Add Some Noise

The way we solve both issues presented above is by adding noise. Noise is a small amount of “error” we add to add each randomly generated number. This small amount of error will ensure we also produce a unique list of random numbers. In the image below, I’ve added a small amount to each random number. The small amount is essentially the current card index divided by an arbitrarily large number. As you know, small numbers divided by big numbers produce very small numbers. Using the card index ensures the small amount of noise added is always different, since no cards have the same index. Alternatively, I could have used the volatile function ROW instead of the card index.

If you look at rows 11, 18 and 22 in the picture, you’ll see the RANDBETWEEN resulted in a 50 for each. But because we’ve added some noise, each item has a different resulting value.

image

Step 4. Re-RANK ‘em

What we need to do now is to transform the values produced into numbers between 1 and 52. We can do this using the RANK function. Since each random number is now unique, we know that each number must have a unique rank within the entire range of random numbers produced. For instance if we were just to consider the first three numbers produced in the image above–21.00002, 49.00003 and 28.00004–these numbers, when ranked in descending largest-to-smallest order would yield rank results of 3 (21.0002 is the lowest numbers on the list), 1 (49.00003 is the greatest), and 2 (28.0004 is in the middle). We’ll do this RANKing across all cards (shown below).

image

Step 5. Use INDEX to Retrieve the Results

If our list of random numbers is unique then the their ranks must also be unique. So we’ve now addressed the two issues described above. We now have a resulting list of numbers that fully spans the range of integer numbers from 1 to 52. The last step is to use this resulting list as an index back onto the original card set.

image

And that’s it! You now have a shuffled deck.

Discussion

This setup was likely easier than you though to build. However, it does come with some unfortunate tradeoffs. The biggest trade off is that it’s not perfectly random. Here’s what I mean: let’s say two random numbers have a collision (i.e. the same random number is picked between them). In the image above, rows 7 and 14 collide at random number 42. We’ve seemingly resolved this issue by making each cell unique. However, because we’re using RANK, the item in row 7 will always be closer to the top of the deck than the item in row 14. This is because the noise is not randomly generated. Collisions in this method will always give sorting preference to the numbers that collide first. If we ran this random sorter a few thousand times, the resulting frequency of each card would demonstrate the sort order is not completely random.

The easy away around this is to have the random numbers collide less often. We use RANDBETWEEN(1,52) in this example because it makes thinking about the problem more approachable. But, since we’re checking the rank of unique numbers in a list, the mathematics do not dictate the random numbers be in any specific range. In fact, we could just as easily replace the random number generating part of the formula to something like RAND()*10000. Now that the range of possible numbers is bigger, it’s less likely they’ll be a collision in our shuffle. However, if we’re thinking broadly about shuffling items, the more items you have to shuffle the more likely they’ll be collisions. That’s just because the potential for collisions increases with each new random number generated.

image

Alternatively, you could implement a Fisher-Yates shuffle. This method will always generate random numbers, and it is indeed possible to use only formulas to implement. However, it’s a bit more complicated and perhaps best served as an article for another day (but you’re welcome to try it on your own). Until then, this method will suffice for most demonstrations.

Download the worksheet used for this article: Unique Random Sorting

Custom Formats for Dashboards and Spreadsheet Applications

Conditional formats are volatile. If you have too many, you see the effects of this volatility in serious calculations slowdown. As the function guru Charles Williams puts it, “…conditional formats seem to be super-volatile: they are evaluated each time the cell that contains them is repainted on the screen, even in Manual calculation mode….” But even too many non volatile functions can slow things down.

The good news is you have alternatives. For some specific instances you can get away with an Excel feature called Custom Formats, which is faster than conditional formatting and non volatile. It can also replace the IF and other conditional formulas.

However, you’ll deal with some major trade-offs. Custom formats as you’ll see can be confusing. And, because they’re not commonly used, others may not understand the work you’ve completed — and you yourself might even forget since custom format code is obscured away from the spreadsheet in a dialog box.

We’ll talk about those trade-offs and go through a few examples in this article.

Introduction to Custom Formats

Excel comes with an array of built-in custom formats. For instance, in the image below you can see I’ve placed a simple date on the worksheet. The representation of the date as month/date/year is in fact a custom format that Excel applies by default. You can see the format applied to any cell by selecting the cell and selecting More Number Formats from the format dropdown.

image

The image below shows custom code applied to the date on the spreadsheet. Notice initials like d and m stand in for things like day and month. These are part of the custom format codes that can be used to represent dates.

image

Conditional Expressions with Custom Formats

It’s the combination of these letters, numbers, and codes that allow you to make all sorts of customized formats. The codes in the previous image feature no conditional expressions; however, you can write conditions as formatting codes. The basic prototype for conditions is as follows: [Condition 1] ; [Condition 2]; [Else Condition]

Let’s say you apply the following condition code to a cell: [>10] “high”; [>5] “medium”; “low”.  The image below shows this code applied to the cells in column C. Column B simply equals the values in column C. Because the custom format is applied to column C, the values appear as the text we’ve outline in our formatting code. In C3, the value is 15; our custom format code tells Excel to represent this number as  “high.”

image

Notice cell B5 is simply set to be equal to cell C5 with a formula. In fact, all the cells shown in column B simply refer to the adjacent cells in C. The custom format acts as a mask for the actual underlying values. If you were to write =C3+5, the result value would be 20 (assuming you’ve switched the new cell to a General format) even as what’s being shown in C3 is text and not a value. Dates work in the same way: Excel stores dates as discrete whole numbers but they’re represented in a common date format. Custom formats only change how values are represented on a the spreadsheet; they does not affect underlying numerical values.

Custom Formats for Dashboards

Let’s apply what we’ve learned to a dashboard. Let’s say you want to create an alert where a colored dot character, ●, is shown next to over-budget project. The image below shows this effect. You would like a softer orange alert for projects $10,000 over budget and a harsher orange for projects even more over budget than that. The following image shows this effect achieved through custom formats.

image

To create this on your own, you would first figure out which projects are over budget in Column A. The image below shows we can do this easily by subtracting the amount spent from the budgeted amount.

image

Once this calculation is completed we can apply the following custom format to column A:

[Color46][>20000]●;[Color45][>=10000]●;[Color2]

So let’s break this custom format down. First, you might be wondering how to get the dot character. Well, you can copy and paste it from in this blog post. (That’s probably the easiest way!) Alternatively, you can insert it (or any other character) by going to the Insert tab and selecting Symbol. The dot resides in the Geometric Shapes subset. I usually insert the character to a random cell and then copy it from there, pasting it into the custom format as needed.

The next item to point out is the [ColorXX] code. Excel allows you to change the font colors depending upon the conditions satisfied. The numbers 45, 46, and 2 refer to color codes within Excel’s internal colors. Unfortunately, there are only 56 colors available, which is admittedly very constraining.  Click here to see the entire color list. (You might want to bookmark this link so you have the color table handy at all time.) For reference, Colors 46 and 45 are the dark and light orange respectively; Color2 is white.

Once you have the values in Column A as shown above, it’s a simple matter of selecting the values and applying the custom format. This is shown below with some resizing of column A to reflect how I want to see the dots. I’ve also increased the font size. Notice the custom format dropdown box on the ribbon now says “Custom” to reflect the custom format code we’ve created and applied.

image

Discussion

There’s much more to custom formats. For instance, the [>10] “high”; [>5] “medium”; “low” code from above could replace the following nested IF formula =IF(A1 > 10, “high”, IF(A1 > 5, “medium”, “low”)). The greatest advantage to custom formats however is that you can dynamically change how values are represented on the spreadsheet without the weight of additional calculation and volatility. Excel can instantly change how it represents an underlying value much faster than it can recalculate.

But as you can see custom formats are also pretty limited in formatting options; they’re sometimes hard to understand; and how they work isn’t always obvious (their code is hidden away in a dialog box). Indeed, for ninety-five percent of cases, the difference in calculation speed of an IF formula and the dynamic representation of a custom format is nominal.

Still custom formats present many interesting options for developers. For dashboards and models featuring heavy calculation, they prove to be a useful alternative.

Dashboards for Excel

My book Dashboards for Excel uses examples like this and so much more. Buy it today! (The book is out and shipping. Amazon is fixing the listing.)