Combine Multiple Excel Worksheets
Mon 26 August 2019
Last updated
Mon 26 August 2019
Last updated
One of the most commonly used pandas functions is read_excel
. This short article shows how you can read in all the tabs in an Excel workbook and combine them into a single pandas dataframe using one command.
For those of you that want the TLDR, here is the command:
Read on for an explanation of when to use this and how it works.
For the purposes of this example, we assume that the Excel workbook is structured like this:
The process I will describe works when:
The data is not duplicated across tabs (sheet1 is one full month and the subsequent sheets have only a single monthâs worth of data)
The columns are all named the same
You wish to read in all tabs and combine them
The read_excel
function is a feature packed pandas function. For this specific case, we can use the sheet_name
parameter to streamline the reading in of all the sheets in our Excel file.
Most of the time, you will read in a specific sheet from an Excel file:
If you carefully look at the documentation, you may notice that if you use sheet_name=None
, you can read in all the sheets in the workbook at one time. Letâs try it:
Pandas will read in all the sheets and return a collections.OrderedDict
object. For the purposes of the readability of this article, Iâm defining the full url and passing it to read_excel
. In practice, you may decide to make this one command.
Letâs inspect the resulting all_dfs
:
If you want to access a single sheet as a dataframe:
account number | name | sku | quantity | unit price | ext price | date | |
0 | 412290 | Jerde-Hilpert | S2-77896 | 43 | 76.66 | 3296.38 | 2018-03-04 23:10:28 |
1 | 383080 | Will LLC | S1-93683 | 28 | 90.86 | 2544.08 | 2018-03-05 05:11:49 |
2 | 729833 | Koepp Ltd | S1-30248 | 13 | 44.84 | 582.92 | 2018-03-05 17:33:52 |
3 | 424914 | White-Trantow | S2-82423 | 38 | 50.93 | 1935.34 | 2018-03-05 21:40:10 |
4 | 672390 | Kuhn-Gusikowski | S1-50961 | 34 | 48.20 | 1638.80 | 2018-03-06 11:59:00 |
If we want to join all the individual dataframes into one single dataframe, use pd.concat:
In this case, we use ignore_index
since the automatically generated indices of Sheet1
, Sheet2
, etc. are not meaningful.
If your data meets the structure outlined above, this one liner will return a single pandas dataframe that combines the data in each Excel worksheet:
This trick can be useful in the right circumstances. It also illustrates how much power there is in a pandas command that âjustâ reads in an Excel file. The full notebook is available on github if you would like to try it out for yourself.
Reference : https://pbpython.com/pandas-excel-tabs.html