Category Archives: Formulas

“Everyday” Random Shuffling with Formulas

In this article, we’ll talk about how to complete random shuffling with only formulas. Shuffling methods via VBA or a dime a dozen. But it’s much harder to perform a formula-based random shuffle since formulas don’t really allow for iteration. Non-VBA methods therefore require we think about these problems differently (what I describe in Dashboards as “Thinking Outside the Cell”). I call this the “everyday” method because it’s technically imperfect. From  a statistical standpoint, the results are not uniformly distributed. That means the method isn’t perfectly random at all times. We’ll attend to that later in the article.

Step 1. Lay it all out

For now, let’s take a look at a spreadsheet showing our deck of cards (the Excel file is available for download at the end of this article).

image

Though it’s slightly cut off in the picture, there is a complete 52-card deck.

Step 2. Add Some Randomness

Because this is a shuffle, we need to add some randomness to reorganize the cards in the deck. Perhaps the easiest method to do this would be to use RANDBETWEEN. Here, we’ll use =RANDBETWEEN(1,52) since there are 52 cards in total. In the next image, I’ve dragged this formula down so that it’s used 52 times.

image

Anyone who has attempted shuffling in Excel before should immediately realize the problem with using RANDBETWEEN as is. First, there are no guarantees RANDBETWEEN will produce every single number between 1 and 52. Second, there are no guarantees RANDBEETWEEN won’t produce a series of repeated numbers. In the image above, the cells highlighted in peach are duplicates! Clearly, our method needs a few more steps!

Step 3. Add Some Noise

The way we solve both issues presented above is by adding noise. Noise is a small amount of “error” we add to add each randomly generated number. This small amount of error will ensure we also produce a unique list of random numbers. In the image below, I’ve added a small amount to each random number. The small amount is essentially the current card index divided by an arbitrarily large number. As you know, small numbers divided by big numbers produce very small numbers. Using the card index ensures the small amount of noise added is always different, since no cards have the same index. Alternatively, I could have used the volatile function ROW instead of the card index.

If you look at rows 11, 18 and 22 in the picture, you’ll see the RANDBETWEEN resulted in a 50 for each. But because we’ve added some noise, each item has a different resulting value.

image

Step 4. Re-RANK ‘em

What we need to do now is to transform the values produced into numbers between 1 and 52. We can do this using the RANK function. Since each random number is now unique, we know that each number must have a unique rank within the entire range of random numbers produced. For instance if we were just to consider the first three numbers produced in the image above–21.00002, 49.00003 and 28.00004–these numbers, when ranked in descending largest-to-smallest order would yield rank results of 3 (21.0002 is the lowest numbers on the list), 1 (49.00003 is the greatest), and 2 (28.0004 is in the middle). We’ll do this RANKing across all cards (shown below).

image

Step 5. Use INDEX to Retrieve the Results

If our list of random numbers is unique then the their ranks must also be unique. So we’ve now addressed the two issues described above. We now have a resulting list of numbers that fully spans the range of integer numbers from 1 to 52. The last step is to use this resulting list as an index back onto the original card set.

image

And that’s it! You now have a shuffled deck.

Discussion

This setup was likely easier than you though to build. However, it does come with some unfortunate tradeoffs. The biggest trade off is that it’s not perfectly random. Here’s what I mean: let’s say two random numbers have a collision (i.e. the same random number is picked between them). In the image above, rows 7 and 14 collide at random number 42. We’ve seemingly resolved this issue by making each cell unique. However, because we’re using RANK, the item in row 7 will always be closer to the top of the deck than the item in row 14. This is because the noise is not randomly generated. Collisions in this method will always give sorting preference to the numbers that collide first. If we ran this random sorter a few thousand times, the resulting frequency of each card would demonstrate the sort order is not completely random.

The easy away around this is to have the random numbers collide less often. We use RANDBETWEEN(1,52) in this example because it makes thinking about the problem more approachable. But, since we’re checking the rank of unique numbers in a list, the mathematics do not dictate the random numbers be in any specific range. In fact, we could just as easily replace the random number generating part of the formula to something like RAND()*10000. Now that the range of possible numbers is bigger, it’s less likely they’ll be a collision in our shuffle. However, if we’re thinking broadly about shuffling items, the more items you have to shuffle the more likely they’ll be collisions. That’s just because the potential for collisions increases with each new random number generated.

image

Alternatively, you could implement a Fisher-Yates shuffle. This method will always generate random numbers, and it is indeed possible to use only formulas to implement. However, it’s a bit more complicated and perhaps best served as an article for another day (but you’re welcome to try it on your own). Until then, this method will suffice for most demonstrations.

Download the worksheet used for this article: Unique Random Sorting

Advertisements

Change the Font Size, Color, and Style of an Excel Form Control Label

Anyone who has used a Form Control Label likely knows its limitations: you can’t increase the font-size, -color, or style.  Below, you can see that these formatting items have been “grayed out” in the Font group on the Excel Ribbon.

To be sure, the Label control has received a lot of flack for these deficiencies.  A look through some Excel forums shows suggestions to use a TextBox shape or an ActiveX Label instead of the hapless Label control.It’s a tragedy since the other form controls are lightweight and easy to use. Some forum posters even said Labels are best used to cover cells you don’t want the user to click. So sad.   

But things are looking up.  I’ve since discovered you can take a boring Label Control from this…
…to this!
How?
As it turns out, Labels take on the text-font and -size features of a referenced cell.  So, to make my label look this this, I simply linked it to a pre-formatted cell, like G2, below.
In fact, this is the same mechanism to link a Textbox shape to a cell.  The difference here is that a Textbox can take an unformatted cell and apply new formatting on the front-end, when it’s displayed to the user.  Above, we see that the cell must first be formatted, then linked.  In fact, whatever formatting exists in the cell when you first make the link, the label will maintain this format until a new link is created.  For example, if I were to change G2 to a black color and a smaller font, the label would not show these new changes (however, it would change its text if I changed the value in G2 to something else).  So to change the Label’s formatting — even when it’s linked to the same cell — you’ll need to click the label, click the formula bar, and retype the cell link.  
Admittedly, everyone else might have already figured this one out.  However, I’m still very excited.  Don’t get me wrong, Textbox Shapes are great, but having too many could become expensive on your spreadsheet, especially if you are constantly updating the screen (how many redraws can your computer handle before things start to slow?).  Labels, as Form Controls, inherently carry less bloat and overhead.  They’re perfect for dynamic dashboards.  

Numbering Grouped Data for Pivot Tables

We’ve all had some experience with Excel’s automatic numbering.  For example, if you have a simple list and you would like to add consecutive numbers to a column next to your data, you’d simply select the cells containing your first two numbers and then drag accordingly.  Take a look at the sheet below to see an example.

But real-world lists aren’t always so simple.  And for certain data, a simple consecutive list just won’t do.  Sometimes we receive spreadsheets that have some type of intrinsic grouping but no unique group identifier. The challenge is that we would really like to apply Pivot Tables to this data, but we can’t do anything until there exists some type of group identifier.

Problem #1: Items grouped without collation
Take a look at the datasheet below from my nonexistent accounting information system.

Because I’m smart(ish), I know that a new group begins every third row of data.  The first three rows of data (Excel rows 2,3, and 4) represent store 1, the next three store 2, and so forth.  I would like to add an additional group_id column such that each grouping is numbered consecutively.  Something like this:

How?  I use this formula, 
=INT((ROW()-2)/3) + 1
then fill downward.  This works by applying integer division in the amount of each item in a group to the current row. So, for row 3, we have 

=INT((ROW()-2)/3) -> =INT((3-2)/3) – > =INT(1/3) -> = INT(0.33) = 0.  


(If it’s been a while since your last algebra class, just think of it as dividing and then “rounding down” the result.)

I add one (“+ 1”) at the end of the formula so that my grouping doesn’t start at zero.  That’s optional.

Notice though, that I subtract 2 because my data starts on row 2.  If I had started on row 1, I would likewise subtract 1; for row 3, subtract 3.  Here’s a cheat sheet:

For Items Grouped Without Collation

=INT((ROW()-First row of data on spreadsheet) / # Items per Group)[+1] optional  

Problem #2: Items grouped with collation


In the following spreadsheet, we’ve culled information from many different spreadsheets maintained by many different people.

The problem is that each spreadsheet administrator used a different naming convention for the same account (see the highlighted accounts).  And take a look at the final Retained Earnings and note that it is labeled ”  RE.”  Those extra spaces can creep into the spreadsheet and easily go unnoticed.  What a nightmare.

But wait, our list has some semblance of order: accounts can be grouped every sixth row. We just need to group each item, one through five, until the end of the list.  Like so,

How?  With this formula,

=MOD(ROW() – 2,5) + 1

This should look familiar to the one above.  But instead of using integer division, we’re now using modulo division; that is, we’re interested in the remainder.  Take row 3

=MOD(ROW()-2, 5)+1 ->  =MOD(3-2, 5)+1  – >  =MOD(1, 5)+1  -> = 1+1 =2.

Here’s the cheat sheet:

For Items Grouped With Collation

=MOD((ROW()-First row of data on spreadsheet,# Items per Group)[+1] optional  

Final Thoughts

1.  Once you’ve created your new group_id column, it’s a good idea to select your new work, copy, and then paste as values.  If the groupings aren’t going to change later, there’s no reason to keep it as a formula.  Remember, fewer formulas means better Excel performance — especially if you plan to use a Pivot Table later.

2.  If you’re not into the numbering scheme, create a lookup table that maps the numbers to a proper name.  Create another column at the front of your data and use a lookup method, like Index, to map the correct names.  Then do a copy/paste values.

3. If you have a list scheme you use quite often, you can actually save it as a custom fill series and then use it later.

Remember:

For Items Grouped Without Collation

=INT((ROW()-First row of data on spreadsheet) / # Items per Group)[+1] optional  

For Items Grouped With Collation

=MOD((ROW()-First row of data on spreadsheet,# Items per Group)[+1] optional  


Truncating Long Text to Fit into Cells

Have you had a list of names but found your cell width too small to show every name?  Take a look at this:

I had my nonexistent secretary print out this fake itinerary for me.  Nothing lines up neatly.  Look how some of my events overlap the cell boundary between columns B and C.  My 2012 End-of-world Steering Committee meeting on February 1st, 2013 goes long and is then cut-off by the date!

To fix this, you could have Excel automatically size Column B, but that’s a bit unpredictable to have on a dashboard (what if other content is sized off the screen?).  We’ll just have to accept that we can’t show everything, but we can make it look neater.  We’ll employ some type of text truncation method similar to what Microsoft uses to display the names of files in Windows Explorer: if the text to be displayed is greater than a certain character length, simply truncate and add a “…”.

The “Original Recipe” Method

First thing we’ll need to do is figure out how many characters we’ll want to display before truncation.  There’s really no way to do this without trial and error. But, to make life easier, we can use =LEN(“Your Text Here”) to return the length of your text; or, we can use something like  =LEFT(B3, 4) to return the first four characters of text in cell B3.  Keep trying until you find a good character length.  

Hopefully, you got a good handle on using LEN and LEFT above because we’ll be using them from here on out.  Also, I’m going to use 25 characters as my desired character length before truncation.  Below I use  LEFT to pull the first 25 characters from each event name.

Now we’ll need to know which names to add the “…” to.  Following ExcelHero’s discussion on using IFs, we’ll avoid using one here; not to worry, we don’t really need it.

As it turns out, LEFT can take a zero value character length — it just returns nothing.  So to have Excel automatically return what we want, we’ll do this:

=LEFT(“…”,LEN(B3)>25)

Take a look at that second argument on the right.  It takes the length of our event name and tests if it’s greater than our cutoff length of 25.  If it’s less than 25, Excel will return a zero and display nothing.  If it’s greater than the cutoff, Excel will return a 1 and display the first character in our text.  But there’s something I found out while writing this tutorial, I expected Excel to only return one period (“.”), but instead it returns all three (“…”) even if you only put a 1 in on the right.  I’m guessing that the ellipses are treated as one character; however, I’m dubious that this works in every case on every computer, so to have your formula return three characters or zero, you’ll simply change it to:

=LEFT(“…”,(LEN(B3)>25) * 3)

Now you’ll need to concatenate both strings together using the concatenation operator, &.

Finally, as you can see in cell, H7, there is a space between the event name and the ellipses.  This is because the last character before the cutoff was a space.  To fix this, surround the first string formula with the TRIM function; this will automatically delete any leading or ending spaces.  
Putting it all Together

To finish, we’ll combine all our steps into one formula, like so:

=TRIM(LEFT(B3,25)) & LEFT(“…”,LEN(B3)>25)

To get:
…mmmmmmuch better!

Download my example: Spillover Design Example.xlsx