Pandas Style API
Mon 13 May 2019
Last updated
Mon 13 May 2019
Last updated
I have been working on a side project so I have not had as much time to blog. Hopefully I will be able to share more about that project soon.
āđāļ ep āļāļĩāđāđāļĨāđāļ§ āđāļĢāļēāđāļāđ Pandas Profiling āđāļāļāļēāļĢāļāđāļ§āļĒāļāļģāļāļēāļĢāļŠāļģāļĢāļ§āļāļāđāļāļĄāļđāļĨ Exploratory Data Analysis (EDA) āđāļāđāļāđāļēāđāļĢāļēāļāđāļāļāļāļēāļĢāđāļāļĨāļĩāđāļĒāļāđāļāļĨāļāļāđāļāļĄāļđāļĨāļāļīāļ āđ āļŦāļāđāļāļĒ āđ āļŦāļĢāļ·āļāđāļĢāļēāļāđāļāļāļāļēāļĢāļāļĨāđāļāļāļāļĢāļēāļāļāļĩāđāđāļ Pandas Profiling āđāļĄāđāļĄāļĩāļĄāļēāđāļŦāđāļĨāđāļ° āļāļ°āļāļģāļāļĒāđāļēāļāđāļĢ āđāļĢāļēāļŠāļēāļĄāļēāļĢāļāđāļāđ Pandas_UI āļĄāļēāļāđāļ§āļĒāđāļāđ
In the meantime, I wanted to write an article about styling output in pandas. The API for styling is somewhat new and has been under very active development. It contains a useful set of tools for styling the output of your pandas DataFrames and Series. In my own usage, I tend to only use a small subset of the available options but I always seem to forget the details. This article will show examples of how to format numbers in a pandas DataFrame and use some of the more advanced pandas styling visualization options to improve your ability to analyze data with pandas.
The basic idea behind styling is that a user will want to modify the way the data is presented but still preserve the underlying format for further manipulation.
Pandas_UI āļāļ·āļ āđāļāļĢāļ·āđāļāļāļĄāļ·āļāļŦāļāđāļēāļāļ User Interface Tools āļāļĩāđāļāđāļ§āļĒāļāļģāļāļ§āļĒāļāļ§āļēāļĄāļŠāļ°āļāļ§āļāđāļŦāđāđāļĢāļēāļŠāļēāļĄāļēāļĢāļāļāļąāļāļāļēāļĢ āđāļĨāļ°āđāļāđāđāļāļāđāļāļĄāļđāļĨāđāļ Pandas DataFrame āđāļāđāļāļĒāđāļēāļāļĢāļ§āļāđāļĢāđāļ§ āđāļĨāļ°āļāđāļēāļĒāļāļēāļĒ
The most straightforward styling example is using a currency symbol when working with currency values. For instance, if your data contains the value 25.00, you do not immediately know if the value is in dollars, pounds, euros or some other currency. If the number is $25 then the meaning is clear.
āđāļĢāļēāļŠāļēāļĄāļēāļĢāļāđāļĨāļ·āļāļ Row, Column, āđāļāļ·āđāļāļāđāļ āđāļĨāļ° Operation āļāļĩāđāļāđāļāļāļāļēāļĢāđāļ Pandas_UI āļāļ°āļāđāļ§āļĒ Generate Code āļ āļēāļĐāļē Python āđāļŦāđāđāļĢāļē āđāļāļĒāļāļĩāđāđāļĢāļēāđāļĄāđāļāđāļāļāļāļģāļāļ·āđāļāļāļąāļāļāđāļāļąāļ āļāļ·āđāļāļāļēāļĢāļēāļĄāļīāđāļāļāļĢāđāļāđāļēāļ āđ
Percentages are another useful example where formatting the output makes it simpler to understand the underlying analysis. For instance, which is quicker to understand: .05 or 5%? Using the percentage sign makes it very clear how to interpret the data.
Pandas_UI āļāļđāļāļŠāļĢāđāļēāļāļāđāļ§āļĒāđāļāļāđāļāđāļĨāļĒāļĩāļāļĩāđāđāļĢāļēāļāļļāđāļāđāļāļĒ āđāļāđāļ NumPy, plotly, ipywidgets, pandas_profiling, qgrid āđāļāļĒāļŠāļĢāđāļēāļāđāļāđāļ Jupyter Notebook Extension āļāļģāđāļŦāđāļŠāļēāļĄāļēāļĢāļāļĢāļąāļāđāļ Jupyter Notebook āđāļāđāđāļĨāļĒ
The key item to keep in mind is that styling presents the data so a human can read it but keeps the data in the same pandas data type so you can perform your normal pandas math, date or string functions.
Pandas styling also includes more advanced tools to add colors or other visual elements to the output. The pandas documentation has some really good examples but it may be a bit overwhelming if you are just getting started. The rest of this article will go through examples of using styling to improve the readability of your final analysis.
āļāđāļ§āļĒ Code āđāļāļĩāļĒāļāđāļāđ 3 āļāļĢāļĢāļāļąāļ
Letâs get started by looking at some data. For this example we will use some 2018 sales data for a fictitious organization. We will pretend to be an analyst looking for high level sales trends for 2018. All of the data and example notebook are on github. PLease note that the styling does not seem to render properly in github but if you choose to download the notebooks it should look fine.
Check it out on github Last updated: 02/07/2020 11:16:12
Import the necessary libraries and read in the data:
āđāļ ep āļāļĩāđāđāļĨāđāļ§ āđāļĢāļēāđāļāđ Pandas Profiling āđāļāļāļēāļĢāļāđāļ§āļĒāļāļģāļāļēāļĢāļŠāļģāļĢāļ§āļāļāđāļāļĄāļđāļĨ Exploratory Data Analysis (EDA) āđāļāđāļāđāļēāđāļĢāļēāļāđāļāļāļāļēāļĢāđāļāļĨāļĩāđāļĒāļāđāļāļĨāļāļāđāļāļĄāļđāļĨāļāļīāļ āđ āļŦāļāđāļāļĒ āđ āļŦāļĢāļ·āļāđāļĢāļēāļāđāļāļāļāļēāļĢāļāļĨāđāļāļāļāļĢāļēāļāļāļĩāđāđāļ Pandas Profiling āđāļĄāđāļĄāļĩāļĄāļēāđāļŦāđāļĨāđāļ° āļāļ°āļāļģāļāļĒāđāļēāļāđāļĢ āđāļĢāļēāļŠāļēāļĄāļēāļĢāļāđāļāđ Pandas_UI āļĄāļēāļāđāļ§āļĒāđāļāđ
The data includes sales transaction lines that look like this:
Pandas_UI āļāļ·āļ āđāļāļĢāļ·āđāļāļāļĄāļ·āļāļŦāļāđāļēāļāļ User Interface Tools āļāļĩāđāļāđāļ§āļĒāđāļŦāđāđāļĢāļēāļŠāļēāļĄāļēāļĢāļāļāļąāļāļāļēāļĢ āđāļĨāļ°āđāļāđāđāļāļāđāļāļĄāļđāļĨ Pandas DataFrame āđāļāđāļāļĒāđāļēāļāļŠāļ°āļāļ§āļ āđāļĨāļ°āļĢāļ§āļāđāļĢāđāļ§
account number | name | sku | quantity | unit price | ext price | date | |
0 | 740150 | Barton LLC | B1-20000 | 39 | 86.69 | 3380.91 | 2018-01-01 07:21:51 |
1 | 714466 | Trantow-Barrows | S2-77896 | -1 | 63.16 | -63.16 | 2018-01-01 10:00:47 |
2 | 218895 | Kulas Inc | B1-69924 | 23 | 90.70 | 2086.10 | 2018-01-01 13:24:58 |
3 | 307599 | Kassulke, Ondricka and Metz | S1-65481 | 41 | 21.05 | 863.05 | 2018-01-01 15:05:22 |
4 | 412290 | Jerde-Hilpert | S2-34077 | 6 | 83.21 | 499.26 | 2018-01-01 23:26:55 |
āđāļĢāļēāļŠāļēāļĄāļēāļĢāļāđāļĨāļ·āļāļ Row, Column, āđāļāļ·āđāļāļāđāļ āđāļĨāļ° Operation āļāļĩāđāļāđāļāļāļāļēāļĢāđāļ Pandas_UI āļāļ°āļāđāļ§āļĒ Generate Code āļ āļēāļĐāļē Python āđāļŦāđāđāļĢāļē āđāļāļĒāļāļĩāđāđāļĢāļēāđāļĄāđāļāđāļāļāļāļģāļāļ·āđāļāļāļąāļāļāđāļāļąāļ āļāļ·āđāļāļāļēāļĢāļēāļĄāļīāđāļāļāļĢāđāļāđāļēāļ āđ
Given this data, we can do a quick summary to see how much the customers have purchased from us and what their average purchase amount looks like:
Pandas_UI āļāļđāļāļŠāļĢāđāļēāļāļāđāļ§āļĒāđāļāļāđāļāđāļĨāļĒāļĩāļāļĩāđāđāļĢāļēāļāļļāđāļāđāļāļĒ āđāļāđāļ NumPy, plotly, ipywidgets, pandas_profiling, qgrid āđāļāļĒāļŠāļĢāđāļēāļāđāļāđāļ Jupyter Notebook Extension āļāļģāđāļŦāđāļŠāļēāļĄāļēāļĢāļāļĢāļąāļāđāļ Jupyter Notebook āđāļāđāđāļĨāļĒ
mean | sum | |
name | ||
Barton LLC | 1334.615854 | 109438.50 |
Cronin, Oberbrunner and Spencer | 1339.321642 | 89734.55 |
Frami, Hills and Schmidt | 1438.466528 | 103569.59 |
Fritsch, Russel and Anderson | 1385.366790 | 112214.71 |
Halvorson, Crona and Champlin | 1206.971724 | 70004.36 |
āļāļīāļāļāļąāđāļ Library pandas_ui āđāļĨāļ° Enable Jupyter Notebook Extension āđāļŦāđāđāļĢāļĩāļĒāļāļĢāđāļāļĒIn [1]:
For the sake of simplicity, I am only showing the top 5 items and will continue to truncate the data through the article to keep it short.
As you look at this data, it gets a bit challenging to understand the scale of the numbers because you have 6 decimal points and somewhat large numbers. Also, it is not immediately clear if this is in dollars or some other currency. We can fix that using the DataFrame style.format
.
In [2]:
Here is what it looks like now:
Import Library āļāļĩāđāđāļĢāļēāļāļīāļāļāļąāđāļāđāļ§āđāļāđāļēāļāļāļIn [3]:
Using the format
function, we can use all the power of pythonâs string formatting tools on the data. In this case, we use ${0:,.2f}
to place a leading dollar sign, add commas and round the result to 2 decimal places.
For example, if we want to round to 0 decimal places, we can change the format to ${0:,.0f}
āđāļāđāļāļŠāļāļĩāđāđāļĢāļēāļāļ°āđāļāđ Dataset āļāļĩāđāđāļāđāļāļāđāļāļĄāļđāļĨāđāļāļ Tabular āļāļēāļ Adult DatasetIn [4]:
If you are like me and always forget how to do this, I found the Python String Format Cookbook to be a good quick reference. String formatting is one of those syntax elements that I always forget so Iâm hoping this article will help others too.
āļĨāļāļ ls āļāļđāļ§āđāļēāļĄāļĩāđāļāļĨāđāļāļ°āđāļĢāļāđāļēāļIn [5]:
Now that we have done some basic styling, letâs expand this analysis to show off some more styling skills.
If we want to look at total sales by each month, we can use the grouper to summarize by month and also calculate how much each month is as a percentage of the total annual sales.
Out[5]:
We know how to style our numbers but now we have a combination of dates, percentages and currency. Fortunately we can use a dictionary to define a unique formatting string for each column. This is really handy and powerful.
āļŠāļąāđāļāđāļāļīāļāđāļāļĨāđ CSV āđāļāđāđāļĨāļĒ
In [6]:
I think that is pretty cool. When developing final output reports, having this type of flexibility is pretty useful. Astute readers may have noticed that we donât show the index in this example. The hide_index
function suppresses the display of the index - which is useful in many cases.
In [9]:
In addition to styling numbers, we can also style the cells in the DataFrame. Letâs highlight the highest number in green and the lowest number in color Trinidad (#cd4f39).
āļŦāļāđāļēāļāļāļŦāļĨāļąāļ
One item to highlight is that I am using method chaining to string together multiple function calls at one time. This is a very powerful approach for analyzing data and one I encourage you to use as you get further in your pandas proficiency. I recommend Tom Augspurgerâs post to learn much more about this topic.
Another useful function is the background_gradient
which can highlight the range of values in a column.
āļĨāļ Row āļāļĩāđāđāļĨāļ·āļāļāđāļ§āđ
āļāļĢāļ°āļ§āļąāļāļīāļāļēāļĢāđāļāđāđāļ āđāļĢāļēāļŠāļēāļĄāļēāļĢāļāļāļģāđāļāđāļ Python āļāļĩāđ Generate āļĄāļēāđāļŦāđāđāļāđāļāđāļāđāļāđāļāđāđāļĨāļĒ
The above example illustrates the use of the subset
parameter to apply functions to only a single column of data. In addition, the cmap
argument allows us to choose a color palette for the gradient. The matplotlib documentation lists all the available options.
āđāļāđāđāļāļāđāļāļĄāļđāļĨāđāļ DataFrame āđāļāļĒāļāļĢāļ
The pandas styling function also supports drawing bar charts within the columns.
Hereâs how to do it:
āļĄāļĩāļāļąāļāļāđāļāļąāļ Pandas Profiling Report āđāļāļāļąāļ§
This example introduces the bar
function and some of the parameters to configure the way it is displayed in the table. Finally, this includes the use of the set_caption
to add a simple caption to the top of the table.
āđāļĨāļ·āļāļ 1 Column āļĄāļēāļāļĨāđāļāļ Histogram āđāļĢāļēāļŠāļēāļĄāļēāļĢāļāļāļģāđāļāđāļ āļ āļēāļĐāļē Python āļāļĩāđ Generate āļĄāļēāđāļŦāđāđāļāđāļāđāļāđāļāđāļāđāđāļĨāļĒ
The next example is not using pandas styling but I think it is such a cool example that I wanted to include it. This specific example is from Peter Baumgartner and uses the sparkline module to embed a tiny chart in the summary DataFrame.
Hereâs the sparkline function:
āļĢāļāļāļĢāļąāļāļāļĢāļēāļ 3 āļĄāļīāļāļī
We can then call this function like a standard aggregation function:
āļāļąāļ§āļāļĒāđāļēāļāļŦāļāđāļēāļāļāļāļĨāđāļāļāļāļĢāļēāļ 3 āļĄāļīāļāļī Scatter 3D āđāļĢāļēāđāļĄāđāļāđāļāļāļāļģāļāļ·āđāļ Parameter
quantity | ext price | |||
mean | sparkline | mean | sparkline | |
name | ||||
Barton LLC | 24.890244 | ââââââââââ | 1334.615854 | ââââââââââ |
Cronin, Oberbrunner and Spencer | 24.970149 | âââââââ âââ | 1339.321642 | ââ â âââââââ |
Frami, Hills and Schmidt | 26.430556 | âââââââââ â | 1438.466528 | ââ âââ âââââ |
Fritsch, Russel and Anderson | 26.074074 | ââââââââââ | 1385.366790 | ââââââââââ |
Halvorson, Crona and Champlin | 22.137931 | ââââââââââ | 1206.971724 | ââââ ââââââ |
āļŦāļāđāļēāļāļāļŠāļĢāđāļēāļāļāļģāļŠāļąāđāļ Update āļāđāļāļĄāļđāļĨāļāļĩāļĨāļ°āđāļĒāļāļ° āđ āļāļēāļĄāđāļāļ·āđāļāļāđāļāļāļĩāđāļāļģāļŦāļāļ
I think this is a really useful function that can be used to concisely summarize data. The other interesting component is that this is all just text, you can see the underlying bars as lines in the raw HTML. Itâs kind of wild.
The pandas style API is a welcome addition to the pandas library. It is really useful when you get towards the end of your data analysis and need to present the results to others. There are a few tricky components to string formatting so hopefully the items highlighted here are useful to you. There are other useful functions in this library but sometimes the documentation can be a bit dense so I am hopeful this article will get your started and you can use the official documentation as you dive deeper into the topic.
Finally, thanks to Alexas_Fotos for the nice title image.
Reference : https://pbpython.com/styling-pandas.html
Library āļāļĩāđāđāļŦāļĄāđāļĄāļēāļ āļĒāļąāļāļĄāļĩ Bug āļāļĒāļđāđāļŦāļĨāļēāļĒāļāļļāļ
āļāđāļēāđāļāļīāļ Error āđāļŦāđāļĢāļąāļ Cell āđāļŦāļĄāđ
āļāļāļ°āļāļĩāđāļĒāļąāļāđāļĄāđāļĢāļāļāļĢāļąāļ Google Colab
FacebookTwitterEmailLinkedInLineShare
Reference : https://www.bualabs.com/archives/4299/pandas_ui-pandas-dataframe-user-interface-tools-pandas-ep-7/#more-4299