Reading Excel (xlsx) Files
Your Guide to Reading Excel (xlsx) Files in Python
Last updated
Your Guide to Reading Excel (xlsx) Files in Python
Last updated
In this brief Python tutorial, we are going to learn how to read Excel (xlsx) files using Python. Specifically, we will read xlsx files in Python using the Python module openpyxl. First, we start by the simplest example of reading a xlsx file in Python. Second, we will learn how to read multiple Excel files using Python.
In previous posts, we have learned how to use Pandas read_excel method to import xlsx files with Python. As previously mentioned, however, we will use another package called openpyxl in this post. In the next paragraph, we will learn how to install openpyxl.
Basically, here’s the simplest form of using openpyxl for reading a xlsx file in Python:
It is, of course, also possible to learn how to read, write, and append to files in Python (e.g., text files). Make sure to check that post out, as well.
Now, before we will learn what Openpyxl is we need to make sure that we have both Python 3 and the module openpyxl installed. One easy way to install Python is to download a Python distribution such as Anaconda or ActivePython. Openpyxl, on the other hand, can as with many Python packages, be installed using both pip and conda. Now, using pip we type the following in a command prompt, or terminal window, pip install openpyxl
and using conda we type this; conda install openpyxl
. Note, sometimes when we are installing Python packages with pip, we may notice that we don’t have the latest version of pip. Luckily, it is quite easy to upgrade pip to the latest version using pip.
Openpyxl is a Python module that can be used for reading and writing Excel (with extension xlsx/xlsm/xltx/xltm) files. Furthermore, this module enables a Python script to modify Excel files. For instance, if we want togo through thousands of rows but just read certain data points and make small changes to these points, we can do this based on some criteria with openpyxl.
Now, the general method for reading xlsx files in Python (with openpyxl) is to import openpyxl (import openpyxl
) and then read the workbook: wb = openpyxl.load_workbook(PATH_TO_EXCEL_FILE)
. In this post, we will learn more about this, of course.
Now, in this section, we will be reading a xlsx file in Python using openpyxl. In a previous section, we have already been familiarized with the general template (syntax) for reading an Excel file using openpyxl and we will now get into this module in more detail. Note, we will also work with the Path method from the Pathlib module.
In the first step, to reading a xlsx file in Python, we need to import the modules we need. That is, we will import Path and openpyxl:
In the second step, we will create a variable using Path. Furthermore, this variable will point at the location and filename of the Excel file we want to import with Python:
Note, “SimData” is a subdirectory to that of the Python script (or notebook). That is, if we were to store the Excel file in a completely different directory, we need to put in the full path. For example,
if the data is stored in the Documents in our home directory.
In the third step, we are going to read the xlsx file. Now, we are using the load_workbook() method:
Now, in the fourth step, we are going to read the active sheet using the active method:
Note, if we know the sheet name we can also use this to read the sheet we want: play_data = wb_obj['play_data']
In the final, and fifth step, we can work, or manipulate, the Excel sheet we have imported with Python. For example, if we want to get the value from a specific cell we can do as follows:
Another example, on what we can do with the spreadsheet in Python, is that we can iterate through the rows and print them:
Note, that we used the max_row and set it to 6 to print the 6 first row from the Excel file.
In the sixth, and bonus step, we are going to find out how many rows and columns we have in the example Excel file we have imported with Python:
Now, before we learn how to read multiple xlsx files we are going to import data from Excel and into a Python dictionary. It’s quite simple, but for the example below, we need to know the column names before we start. If we want to find out the column names we can run the following code (or just open the Excel file):
In this section, we will finally read the Excel file using Python and create a dictionary.
Now, let’s walk through the code example above. First, we create a Python dictionary (data). Second, we loop through each row (using iter_rows) and we only go through the rows where there are values. Second, we have an if statement where we check if it’s the first row and we add the keys to the dictionary. That is, we set the column names as keys. Third, we append the data to each key (column name) in the else statement.
In this section, we will learn how to read multiple xlsx files in Python using openpyxl. Additionally to openpyxl and Path, we are also going to work with the os module.
In the first step, we are going to import the modules Path, glob, and openpyxl:
Second, we are going to read all the .xlsx files in a subdirectory into a list. Now, we use the glob module together with Path:
Third, we can now read all the xlsx files using Python. Again, we will use the load_workbook method. However, this time we will loop through each file we found in the subdirectory,
Now, in the code examples above, we are using Python list comprehension (twice, in both step 2 and 3). First, we create a list of all the xlsx files in the “XLSX_FILES” directory. Second, we loop through this list and create a list of workbooks. Of course, we could add this to the first line of code above.
In the fourth step, we can now work with the imported excel files. For example, we can get the first file by adding “[0]” to the list. If we want to know the sheet names of this file we do like this:wbs[0].sheetnames
.That is, many of the things we can do, and have done in the previous example on reading xlsx files in Python, can be done when we’ve read multiple Excel files.
In this post, we have learned how to:
Read an Excel file in Python using openpyxl
Read a xlsx file to a Python dictionary
Read multiple Excel fils in Python
It is if course possible to import data from a range of other file formats. For instance, read the post about parsing json files in Python to learn more about reading JSON files.
Source : https://www.marsja.se/your-guide-to-reading-excel-xlsx-files-in-python/