Make Excel Faster
September 24, 2019
Last updated
September 24, 2019
Last updated
Excel is very, very pervasive in todayâs businesses. At Dataquest, we generally recommend working with code for quite a few reasons, and many of our data science courses are aimed at teaching efficient coding for data analysis and data science. But no matter how strongly you prefer working with Python, at the end of the day there is sometimes a need to present your findings or share your data using Excel.
But that doesnât mean canât still enjoy some of the efficiencies of Python! In fact, using a library called xlwings
, you can use Python to make working in Excel faster.
In this xlwings tutorial, weâll walk through how to use Python in Excel to execute and use some common operations like deleting rows based on certain criteria, using Excel functions and formulas, autofilling, creating sheets, charts, etc. In order to follow this post you should be familiar with basic Python concepts (objects, methods, attributes, functions) and Pythonâs syntax and have an intermediate knowledge of Excel and VBA.
We will be working with a data set that contains information regarding the draws of an European lottery called EuroMillions. This data set was downloaded from this link and it contains all the EuroMillions draws up to, and including, the 20th of September. The data available at that link should be updated with the latest information up to whenever youâre reading this post, but in case itâs unavailable, hereâs a CSV file with the data from that link as of September 20.
Euromillions is a multinational lottery available in some European countries, specifically in Andorra, Austria, Belgium, France (including overseas regions and collectivities), Ireland, the Isle of Man, Liechtenstein, Luxembourg, Monaco, Portugal, Spain, Switzerland and the United Kingdom (source).
As of this writing, the draw consists of five numbers from a pool of 50 numbers (numbered 1 through 50) and two numbers called lucky stars
from a pool of 12 numbers. In order to win the jackpot, participants must correctly choose all drawn numbers and lucky stars. The largest jackpot ever won was âŽ190 million. (Note, though, that our data set denominates winnings in Pounds, not Euros).
In this tutorial, weâll use Python and xlwings with Excel to clean up a data set and then generate some graphs to visualize which numbers win the EuroMillions most frequently.
No. | Day | DD | MMM | YYYY | N1 | N2 | N3 | N4 | N5 | L1 | L2 | Jackpot | Wins | |
627 | 514 | Fri | 24 | Aug | 2012 | 6 | 5 | 19 | 37 | 12 | 7 | 3 | 37709047 | 1 |
230 | 911 | Tue | 14 | Jun | 2016 | 13 | 50 | 42 | 39 | 34 | 11 | 9 | 11928000 | 0 |
998 | 143 | Fri | 3 | Nov | 2006 | 13 | 11 | 44 | 24 | 49 | 9 | 3 | 88344099 | 0 |
1071 | 70 | Fri | 10 | Jun | 2005 | 37 | 32 | 47 | 7 | 6 | 7 | 1 | 21734699 | 0 |
522 | 619 | Tue | 27 | Aug | 2013 | 7 | 40 | 38 | 43 | 30 | 6 | 2 | 12931500 | 0 |
The first column is the draw number, columns N1-L2
are the drawn numbers and lucky stars (by the order they were drawn), the Jackpot
column is the jackpot in Euros and the Wins
column tell us how many bets hit the jackpot.
xlwings
xlwings
is a Python library that makes some of the data analysis features of Python available in an Excel instance, including support for numpy
arrays and pandas
Series and DataFrames. Like any other Python library, it can be installed using common methods like pip
or conda
, but you can access the documentation for xlwings
here if you need additional details.
Note that youâll need to have a version of Microsoft Excel installed on the computer youâre using to do this xlwings tutorial.
xlwings objects
In xlwings
there are four main object types which are, in decreasing hierarchical order: App
(which represents an Excel instance), Book
, Sheet
and Range
. In addition to these weâll also be dealing with Chart
and Shape
objects. You can find useful information regarding these and other objects in the official documentation, but weâll be looking at each of these objects one at a time.
Letâs start by creating a Book
instance and naming it wb
(workbook).
When you run that code, it should look something like this.
Notice that when the code cell is run in the Jupyter Notebook, Excel is launched automatically.
By instantiating a Book
object, an App
object that belongs to our book object is created automatically. Hereâs how we can check all opened Excel instances.
Note: weâre not going to include gifs for every step of this tutorial, because we donât want this page to be a hassle to load for people with slow or limited internet connections. However, subsequent code-running steps should look similar to what we can see above: when we run a cell in Juypter, the Excel spreadsheet is updated in accordance with whatever code weâve run.
The object xw.apps
is an iterable. To check what workbooks belong to the unique instances in this iterable we can invoke the books
method on it like so.
As expected the only instance of this iterable is the workbook wb
. We check this fact below.
Similarly, we can check what sheets belong to this workbook:
We can also refer to sheets by their name:
Sheet
objects have a name
attribute that works as expected. Let us change the name of our only sheet.
We can move data from certain Python objects (e.g. lists and tuples) into Excel. Letâs move the data in our dataframe into the sheet EuroMillions. To do this, weâll make use of range
to create a range object that stores the data from our DataFrame in a range of cells in Excel, starting in this case with the cell A1:
Hereâs how that will look:
As we can see, the index column of df
was also moved to Excel. Letâs clear the contents of this sheet and copy the data without the index.
It will be useful to be able to tell where our table ends. More specifically, weâll need the last row that has data in it. To this end, we can use the end
method and the row
attribute of Range
objects.
The row
method, not surprisingly, returns the row
of the Range
object.
The method end
takes in as an argument a direction ("up"
(or 1
), "right"
(or 2
), "left"
(or 3
, "down"
(or 4
)) and returns another range object. It mimics the very common action in Excel CTRL+Shift+Arrow.
It checks out!
The API property
Not every Excel functionality is available as a native xlwings
feature. Sometimes weâll have to find workarounds to do what we want. Fortunately xlwings
makes this very easy for us. From the Missing Features section in the official documentation:
Workaround: in essence, xlwings is just a smart wrapper around pywin32 on Windows and appscript on Mac. You can access the underlying objects by calling the api property. The underlying objects will offer you pretty much everything you can do with VBA, using the syntax of pywin32 (which pretty much feels like VBA) and appscript (which doesnât feel like VBA). But apart from looking ugly, keep in mind that it makes your code platform specific (!). Excel Visual Basic for Applications is a rich source of explanations for the various existing Excel objects.
Sorting is one such functionality that is missing from xlwings
. You may have noticed that the records are ordered from the most recent to the oldest draw. In the next few steps weâll be reversing the order.
The object ws.range("A2:N{row}".format(row=last_row))
is a Range
object. Appending the api
property to it yields a VBA Range object which in turn gives access to its VBA features.
Weâll be using the Sort property of this VBA object. In its simplest application, Sort
takes in two arguments: the column by which we want to order the table (as a VBA Range object), and the order type (whether we want to sort it in ascending or descending order). The second argumentâs parameterâs documentation can be seen here. Weâll be sorting in ascending order.
Putting all of that together looks like this:
Hereâs how itâll look on your screen after running (note that the first column has changed, and itâs now sorted in ascending rather than descending order.
Analyzing Our Data
One problem weâll encounter in trying to analyze this data set is that the dates are scattered across three different columns. Weâll need to compress it into a single column. To do this, weâll concatenate the columns appropriately in Excel, using Python. We begin by inserting a header in the empty adjacent column.
Next, we can insert the Excel formula we want to use as a string. Note: the specifics of what argument separator you should use depends on your machineâs local regional settings. On mine, the argument separator is a comma and thatâs what Iâve used in this tutorial, but on yours it might be a semicolon.
Having inserted the formula in the first cell, itâs second nature in the regular Excel workflow to autofill the rest of the cells through the end of the table. Autofill is a method of the VBA Range
object. It takes as parameters the destination cells as a VBA Range
object and the type of fill. Weâre interested in the default one whose enumeration is 0
.
Hereâs roughly how the screen should look after this step; note the new âDateâ column on the far right.
We can also use the named form of the type of fill we want. To do this weâll need to retrieve it from the module xlwings.constants
which contains named versions of the enumerated arguments of most VBA properties. Recall that you can always check the available attributes by printing dir(xlwings.constants)
.
(If youâre not familiar with it, dir
is a native Python function and can take several kinds of arguments (modules, classes and regular objects (like lists and strings). For example, if you print dir(some_list)
it will give you all the methods and attributes that you can use with a list.)
What we did above could also be achieved with the following code snippet.
Since weâll be using this often, weâll create a function that applies the default fill, given:
a worksheet
a string representing a cell in the worksheet
a last row to fill.
To do this weâll introduce a new Range
method called get_address
. It takes in four Boolean arguments and returns a string that identifies the range with varying levels of detail. Hereâs a hopefully enlightening illustration of this method.
And now we define our function.
In order to save Excel from making unnecessary computations, weâll replace the formulas we just inserted on column O
with hard coded values. Before we do this let us take a moment to think about what kind of Python object is Range.value
when Range
is an array.
Itâs a list! Letâs see its first ten elements.
If we insert this list into any range, it will place the values horizontally, which isnât what we want. In order to place them vertically, weâll need to use the options
method of Range
objects with the option transpose=True
as a parameter, like this:
And we can now delete columns C
through E
.
EuroMillions format has suffered some mild modifications over the years, the last one happened on September 24, 2016.
From September 24, 2016 the amount of lucky stars changed from a pool of 11 to a pool of 12 numbers. In order to make a meaningful analysis, weâll only consider draws that occurred after the last modification. The next code snippet finds the last game that happened prior to the modification and names it to_delete
.
And we can now delete every row from the first game to to_delete
.
Hereâs where weâre at, at this point:
Having finished preparing the data, we will now format this table. Weâll begin by setting the font of the first row to bold.
We can follow that by formatting the Jackpot
column in millions. Note that the string format below depends on your machineâs local regional settings. If the format looks odd on your end, try swapping the commas with the dots. More on Excel custom formats here.
As an auxiliary step for what will follow, weâll find the letter corresponding to the last column that has data.
Letâs now add a border to the bottom of the header cells. Similarly to what we have been doing, weâll use the api
property. Additionally we are going to need the Border
property of Range
objects, the border orientation enumeration and the style of the border. Weâll be setting a double edged border (line style -4119
) on the bottom of the header cells only (orientation 9
).
Let us now autofit by both rows and columns.
Oops! This looks a bit squished, let us set the width of all columns to that of column J
which seems to be the largest. Here is the ColumnWidth
documentation that we are using just below.
That should look better. And weâre done with this sheet!
Let us add
a new blank sheet called Frequencies
and let us assign it to the Python name frequencies
.
We will be populating this sheet with the absolute frequencies of each number and each lucky star in the data set we just organized in the sheet EuroMillions
.
Below weâll be inserting a header for the frequencies in cell B1
and in cell B2
we will input a formula that will count how many times the value in A2
occurs in the range C2:G201
. In other words, we will count how many times 1
occurred in the columns N1-N5
. After this we will autofill the rest of the cells on column B
to do the same for their respective rows.
And we do the same for the lucky stars:
Hereâs how our new sheet should be looking at this point:
We are nearing our goal. Letâs create a sheet called Graphs
.
Now weâll create a Chart
object. This will only produce a blank white box, but donât worry! Weâre going to use that box to chart our data in a moment.
We can name
our chart similarly to what we did for sheets. The method set_source_data
allows us to define the data source for our chart by passing in a range object.
Excel will try to guess what the x-axis should be, but we can force it to be the numbers we created on Frequencies
by using the VBA Chart
method FullSeriesCollection
. We can edit the chart by using the object at index 1 of nr_freq.api
:
Excel is very good at guessing what kind of chart the user wants, but just in case it guesses incorrectly, we will force it to be a column chart. The various types of charts are listed here. Alas, the only documentation bridging these to the possible values of the chart_type
attribute is the source code itself.
We will now define the height and the width of the chart. The measure unit will be points.
Hereâs what we should be seeing at this point:
The SetElement
method together with the parameter 2
sets the title above the chart. See other arguments here.
And we add the final touches. We remove the legend using the HasLegend
property.
The xlCategory
category passed in as the argument 1 to the Axes
method, together with the property TickLabelSpacing set to 1
, ensures that every element of our axis is displayed.
To finish formatting this chart, we remove the outline by setting the property Visible
of the Line
object to 0
.
And hereâs what weâll see:
Below we do nearly the same thing for the lucky stars.
And to finalize we create a time series graph displaying the evolution of the jackpot.
And we fix the vertical axisâ labelsâ format by setting the TickLabels property NumberFormat
to the desired appearance.
And weâre done! Now we save the file and exit our Excel instance.
Hopefully, this xlwings tutorial has been helpful!
Some useful resources for learning about xlwings
are the official documentation, this formatting cheat sheet, the VBA for Excel documentation and the course xlwings: Python for Excel designed by Felix Zumstein himself, the developer of xlwings
.