📒
Knowledge-Base
  • Knowledge Base
  • Tutorials
    • Python
      • Introduction
      • Important basic syntax
      • Awesome Python
      • Python 101
      • Python Cheat sheet
      • āđ‚āļ„āļĢāļ‡āļŠāļĢāđ‰āļēāļ‡āļ‚āļ­āļ‡āļ āļēāļĐāļē
      • Library & Package
      • Variable & Data Types
      • Lists
      • Dictionary
      • Function
      • Built-in Function
        • enumerate()
      • Modules
      • Classes & Objects
      • Inheritance
      • Date & Time
      • āļāļēāļĢāđƒāļŠāđ‰āļ‡āļēāļ™ Virtualenv
    • Pandas
      • Learning Pandas Second Edition
        • 2. Running with pandas
        • 3. Data with the Series
        • 4. Create DataFrame
        • 5. Manipulating DataFrame Structure
  • e-Book
    • Tech
      • Automate the Boring Stuff
      • A Whirlwind Tour of Python
  • Innovation & Tech
    • Python
    • Pandas
      • 10 Pandas tips
    • Web Scraping
      • Web Scraping 101
      • Requests and BeautifulSoup
  • Industry
    • 20 āđāļ™āļ§āļ„āļīāļ”āļ‚āļēāļĒāļ‚āļ­āļ‡āļ­āļ­āļ™āđ„āļĨāļ™āđŒ
    • āđāļœāļ™āļĢāļ°āļĒāļ°āļĒāļēāļ§āļ‚āļ­āļ‡ Toyota
    • āđ‚āļĨāļāļŦāļĨāļąāļ‡āļĒāļļāļ„āđ‚āļĨāļāļēāļ āļīāļ§āļąāļ’āļ™āđŒ
  • Opinion
    • āļšāļĢāļĢāļĒāļ‡ āļžāļ‡āļĐāđŒāļžāļēāļ™āļīāļŠ: āļāļąāļšāļ”āļąāļāļĢāļąāļāļĢāļēāļŠāļāļēāļĢ 4.0
    • āļ›āļąāļāļāļēāļ āļīāļāđ‚āļ āļ“ Wongnai WeShare
    • āļ›āļĢāļ°āđ€āļ—āļĻāđ„āļ—āļĒāđƒāļ™āļ„āļ§āļēāļĄāļ„āļīāļ” āļ„āļ§āļēāļĄāļ„āļīāļ”āđƒāļ™āļ›āļĢāļ°āđ€āļ—āļĻāđ„āļ—āļĒ
    • āļāļēāļĢāļŠāļ–āļēāļ›āļ™āļē ‘āļĢāļąāļāļšāļĢāļĢāļĐāļąāļ—āļ­āļģāļ™āļēāļˆāļ™āļīāļĒāļĄâ€™ āđƒāļ™āļŠāļąāļ‡āļ„āļĄāđ„āļ—āļĒ
  • People
    • “āļ„āļ§āļēāļĄāļŠāļģāđ€āļĢāđ‡āļˆ āļ”āļĩāđƒāļˆāđ„āļ”āđ‰āļ§āļąāļ™āđ€āļ”āļĩāļĒāļ§â€
    • āđ‚āļ‹āđ€āļŠāļĩāļĒāļĨāļĄāļĩāđ€āļ”āļĩāļĒ āđƒāļ™āļĄāļļāļĄāļĄāļ­āļ‡āļ‚āļ­āļ‡ āļĄāļēāļĢāđŒāļ āļ‹āļąāļāđ€āļ„āļ­āļĢāđŒāđ€āļšāļīāļĢāđŒāļ
  • Parent
    • ADULTIFICTION
    • āļ„āļ§āļēāļĄāļ‰āļĨāļēāļ”āļŠāļĢāđ‰āļēāļ‡āđ„āļ”āđ‰
    • āļāļēāļĢāđ€āļĢāļĩāļĒāļ™āļĢāļđāđ‰āļ‚āļ­āļ‡āļĨāļđāļāđƒāļ™āļ§āļąāļ™āļ™āļĩāđ‰
    • CODING āļ„āļ·āļ­āļ­āļ°āđ„āļĢ
    • āļŠāļ­āļ™ CODING āļ­āļĒāđˆāļēāļ‡āđ„āļĢāđƒāļŦāđ‰āļ‡āđˆāļēāļĒ
    • 8 āļ‚āđ‰āļ­āļ„āļĢāļđāļ„āļ§āļĢāļĢāļđāđ‰ āđ€āļĄāļ·āđˆāļ­āļˆāļąāļ”āļāļēāļĢāđ€āļĢāļĩāļĒāļ™āļĢāļđāđ‰āļœāđˆāļēāļ™āļ›āļĢāļ°āļŠāļšāļāļēāļĢāļ“āđŒ
  • Lift
    • āļ„āļļāļ“āļĢāļđāđ‰āļŠāļķāļāļ§āđˆāļē āđ‚āļĨāļāļ—āļļāļāļ§āļąāļ™āļ™āļĩāđ‰āļŦāļĄāļļāļ™āđ€āļĢāđ‡āļ§āđāļĨāļ°āđāļ„āļšāļĨāļ‡āļŦāļĢāļ·āļ­āđ€āļ›āļĨāđˆāļē?
    • āļ›āļąāļˆāļˆāļļāļšāļąāļ™āđ€āļĢāļēāļ•āđ‰āļ­āļ‡āđ€āļœāļŠāļīāļāļāļąāļšāļ„āļ§āļēāļĄāļ—āđ‰āļēāļ—āļēāļĒāļ­āļ°āđ„āļĢāļšāđ‰āļēāļ‡
    • āļāļŽ 40% āļ‚āļ­āļ‡āļŦāļ™āđˆāļ§āļĒ SEAL
    • e-Book
      • Sapiens – A Brief History of Humankind
        • [āļŠāļĢāļļāļ›] āđ‚āļŪāđ‚āļĄ āđ€āļ‹āđ€āļ›āļĩāļĒāļ™āļŠāđŒ āļŠāļąāļ•āļ§āđŒāļĄāļŦāļąāļĻāļˆāļĢāļĢāļĒāđŒāđāļĨāļ°āļ–āļīāđˆāļ™āļ—āļĩāđˆāļ­āļĒāļđāđˆ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 1- āļāļģāđ€āļ™āļīāļ” Homo Sapiens
        • āļ•āļ­āļ™āļ—āļĩāđˆ 2 – āļŠāļīāđˆāļ‡āļ—āļĩāđˆāļ—āļģāđƒāļŦāđ‰āđ€āļĢāļēāļ„āļĢāļ­āļ‡āđ‚āļĨāļ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 3 – āļĒāļļāļ„āđāļŦāđˆāļ‡āļāļēāļĢāļĨāđˆāļēāļŠāļąāļ•āļ§āđŒāđ€āļāđ‡āļšāļžāļ·āļŠāļœāļĨ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 4 – āļāļēāļĢāļŦāļĨāļ­āļāļĨāļ§āļ‡āļ„āļĢāļąāđ‰āļ‡āļĒāļīāđˆāļ‡āđƒāļŦāļāđˆ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 5 – āļ„āļļāļāļ—āļĩāđˆāļĄāļ­āļ‡āđ„āļĄāđˆāđ€āļŦāđ‡āļ™
        • āļ•āļ­āļ™āļ—āļĩāđˆ 6 – āļāļģāđ€āļ™āļīāļ”āļ āļēāļĐāļēāđ€āļ‚āļĩāļĒāļ™
        • āļ•āļ­āļ™āļ—āļĩāđˆ 7 – āļ„āļ§āļēāļĄāđ€āļŦāļĨāļ·āđˆāļ­āļĄāļĨāđ‰āļģ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 8 – āđ‚āļĨāļāļ—āļĩāđˆāļ–āļđāļāļŦāļĨāļ­āļĄāļĢāļ§āļĄ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 9 – āļĄāļ™āļ•āļĢāļēāļ‚āļ­āļ‡āđ€āļ‡āļīāļ™āļ•āļĢāļē
        • āļ•āļ­āļ™āļ—āļĩāđˆ 10 – āļˆāļąāļāļĢāļ§āļĢāļĢāļ”āļī
        • āļ•āļ­āļ™āļ—āļĩāđˆ 11 – āļšāļ—āļšāļēāļ—āļ‚āļ­āļ‡āļĻāļēāļŠāļ™āļē
        • āļ•āļ­āļ™āļ—āļĩāđˆ 12 – āļĻāļēāļŠāļ™āļēāđ„āļĢāđ‰āļžāļĢāļ°āđ€āļˆāđ‰āļē
        • āļ•āļ­āļ™āļ—āļĩāđˆ 13 – āļĒāļļāļ„āđāļŦāđˆāļ‡āļ„āļ§āļēāļĄāđ„āļĄāđˆāļĢāļđāđ‰
        • āļ•āļ­āļ™āļ—āļĩāđˆ 14 – 500 āļ›āļĩāđāļŦāđˆāļ‡āļ„āļ§āļēāļĄāļāđ‰āļēāļ§āļŦāļ™āđ‰āļē
        • āļ•āļ­āļ™āļ—āļĩāđˆ 15 – āđ€āļĄāļ·āđˆāļ­āļĒāļļāđ‚āļĢāļ›āļ„āļĢāļ­āļ‡āđ‚āļĨāļ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 16 – āļŠāļ§āļąāļŠāļ”āļĩāļ—āļļāļ™āļ™āļīāļĒāļĄ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 17 – āļˆāļēāļ™āļ­āļĨāļđāļĄāļīāđ€āļ™āļĩāļĒāļĄāļ‚āļ­āļ‡āļ™āđ‚āļ›āđ€āļĨāļĩāļĒāļ™
        • āļ•āļ­āļ™āļ—āļĩāđˆ 18 – āļ„āļĢāļ­āļšāļ„āļĢāļąāļ§āļĨāđˆāļĄāļŠāļĨāļēāļĒ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 19 – āļŠāļļāļ‚āļŠāļĄāļšāđˆāļĄāļīāļŠāļĄ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 20 – āļ­āļ§āļŠāļēāļ™ Sapiens
      • Homo Deus
        • [āļŠāļĢāļļāļ›āļŦāļ™āļąāļ‡āļŠāļ·āļ­] Homo Deus
        • āļ•āļ­āļ™āļ—āļĩāđˆ 1: āļŠāļēāļĄāļ§āļēāļĢāļ°āđƒāļŦāļĄāđˆāđāļŦāđˆāļ‡āļ­āļ™āļēāļ„āļ•
        • āļ•āļ­āļ™āļ—āļĩāđˆ 2: āļ„āļģāļŠāļēāļ›āđ€āļĢāļ·āđˆāļ­āļ‡āļ”āļĩāļ­āļļāļŠ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 3: āđ€āļ‹āđ€āļ›āļĩāļĒāļ™āļŠāđŒāļ„āļĢāļ­āļ‡āđ‚āļĨāļāđ„āļ”āđ‰āļ­āļĒāđˆāļēāļ‡āđ„āļĢ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 4: āļžāļĨāļąāļ‡āļ‚āļ­āļ‡āļˆāļīāļ•āļ§āļīāļŠāļąāļĒāļĢāđˆāļ§āļĄ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 5: āļ‚āđ‰āļ­āļ•āļāļĨāļ‡āđ€āļĢāļ·āđˆāļ­āļ‡āļ„āļ§āļēāļĄāļ—āļąāļ™āļŠāļĄāļąāļĒāļāļąāļšāđ€āļ—āļ§āļ—āļąāļ“āļ‘āđŒ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 6: āļ›āļĨāļēāļĒāļ—āļēāļ‡āļ‚āļ­āļ‡āļāļēāļĢāļ›āļāļīāļ§āļąāļ•āļīāļĄāļ™āļļāļĐāļĒāđŒāļ™āļīāļĒāļĄāļ„āļ·āļ­āļ­āļ āļīāļĄāļ™āļļāļĐāļĒāđŒ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 7: āđ„āļĄāđˆāļĄāļĩāļ—āļąāđ‰āļ‡āđ€āļˆāļ•āļˆāļģāļ™āļ‡āđ€āļŠāļĢāļĩāđāļĨāļ°āļ§āļīāļāļāļēāļ“āđƒāļ™āđ‚āļĨāļāļ‚āļ­āļ‡āļ‚āđ‰āļ­āļĄāļđāļĨāļ™āļīāļĒāļĄ (dataism)
        • āļ•āļ­āļ™āļ—āļĩāđˆ 8: āđ€āļ‹āđ€āļ›āļĩāļĒāļ™āļŠāđŒāļāļĨāļēāļĒāđ€āļ›āđ‡āļ™āļŠāļīāđˆāļ‡āļŠāļģāļĢāļļāļ”āļ—āļēāļ‡āļ›āļĢāļ°āļ§āļąāļ•āļīāļĻāļēāļŠāļ•āļĢāđŒāđ„āļ”āđ‰āļ­āļĒāđˆāļēāļ‡āđ„āļĢ
        • āļ•āļ­āļ™āļ—āļĩāđˆ 9: āļĄāļīāļˆāļ‰āļēāļ—āļīāļāļīāļ—āļĩāđˆāļĢāđ‰āļēāļĒāđāļĢāļ‡āļ—āļĩāđˆāļŠāļļāļ”āđƒāļ™āļĒāļļāļ„āļ‚āļąāļ­āļĄāļđāļĨāļ™āļīāļĒāļĄ (dataism)
        • āļ•āļ­āļ™āļ—āļĩāđˆ 10: āļžāļĨāļąāļ‡āļāļļāļ“āļ‘āļēāļĨāļīāļ™āļĩ āļ„āļ·āļ­āđ€āļŠāđ‰āļ™āļ—āļēāļ‡āļŠāļđāđˆāļ”āđ‰āļēāļ™āļŠāļ§āđˆāļēāļ‡āļ‚āļ­āļ‡ Homo Deus
        • āļ•āļ­āļ™āļ—āļĩāđˆ 11: āļ—āļĪāļĐāļŽāļĩāđāļŦāđˆāļ‡āļŠāļĢāļĢāļžāļŠāļīāđˆāļ‡â€‹(Theory​ of​ Everything)​āļ‚āļ­āļ‡āļĨāļąāļ—āļ˜āļīāļ‚āđ‰āļ­āļĄāļđāļĨāļ™āļīāļĒāļĄāļāļąāļšâ€‹āļ§āļīāļ–āļĩāđāļŦāđˆāļ‡āļ•āļąāļ§āļ•āļ™
        • āļ•āļ­āļ™āļ—āļĩāđˆ 12: āđ€āļĢāļēāļ•āđ‰āļ­āļ‡āļāđ‰āļēāļ§āļ‚āđ‰āļēāļĄāđāļ•āđˆāļŦāļĨāļ­āļĄāļĢāļ§āļĄāļĨāļąāļ—āļ˜āļīāļ‚āđ‰āļ­āļĄāļđāļĨāļ™āļīāļĒāļĄ
  • See Behind the FX rate
  • Obtaining Stock Prices
  • Monte Carlo Simulation in Finance Python Part-2
  • The Easiest Data Cleaning Method using Python & Pandas
  • How to use iloc and loc for Indexing and Slicing Pandas Dataframes
  • Converting HTML to a Jupyter Notebook
  • Top 50 Tips & Tricks
Powered by GitBook
On this page
  • 10 Pandas tips to make data analysis faster
  • 1. Styling
  • 2. Pandas options
  • 3. Group by with multiple aggregations
  • 4. Column slicing
  • 5. Add row ID / random row ID to each group
  • 6. List all unique values in a group
  • 7. Add row total and column total to a numerical dataframe
  • 8. Check memory usage
  • 9. Cumulative sum
  • 10. Crosstab

Was this helpful?

  1. Innovation & Tech
  2. Pandas

10 Pandas tips

PreviousPandasNextWeb Scraping

Last updated 5 years ago

Was this helpful?

10 Pandas tips to make data analysis faster

1. Styling

Have you ever complained about the table output looks boring when you do .head() in Jupyter notebooks? Is there a way not to display indexes (especially when there is already an ID column)? There’re ways to fix these issues.

A. Highlight all negative values in a dataframe. (example revised from )

import pandas as pd  
def color_negative_red(val):  
    color = 'red' if val < 0 else 'black'  
 return 'color: %s' % colordf = pd.DataFrame(dict(col_1=[1.53,-2.5,3.53],   
                       col_2=[-4.1,5.9,0])  
                 )  
df.style.applymap(color_negative_red)
df = pd.DataFrame(np.random.randn(5, 3))  
df.style.set_table_styles(  
[{'selector': 'tr:hover',  
  'props': [('background-color', 'yellow')]}]  
)

D. More CSS styles. You can use CSS to change the appearance of the table.

df = pd.DataFrame(  
dict(departure=['SFO', 'SFO', 'LAX', 'LAX', 'JFK', 'SFO'],  
     arrival=['ORD', 'DFW', 'DFW', 'ATL', 'ATL', 'ORD'],  
     airlines=['Delta','JetBlue','Delta',’AA','SouthWest',    
               'Delta']),  
columns=['airlines', 'departure','arrival'])
df.style.set_table_styles(  
[{'selector': 'tr:nth-of-type(odd)',  
  'props': [('background', '#eee')]},   
 {'selector': 'tr:nth-of-type(even)',  
  'props': [('background', 'white')]},  
 {'selector': 'th',  
  'props': [('background', '#606060'),   
            ('color', 'white'),  
            ('font-family', 'verdana')]},  
 {'selector': 'td',  
  'props': [('font-family', 'verdana')]},  
]  
).hide_index()

2. Pandas options

The reader may have experienced the following issues when using .head(n) to check the dataframe: (1) There’re too many columns / rows in the dataframe and some columns / rows in the middle are omitted. (2) Columns containing long texts get truncated. (3) Columns containing floats display too many / too few digits.

One can set

import pandas as pd   
pd.options.display.max_columns = 50  # None -> No Restrictions  
pd.options.display.max_rows = 200    # None -> Be careful with this   
pd.options.display.max_colwidth = 100  
pd.options.display.precision = 3

to solve these issues.

3. Group by with multiple aggregations

In SQL we can do aggregations like

SELECT A, B, max(A), avg(A), sum(B), min(B), count(*)  
FROM table  
GROUP BY A, B

In Pandas it can be done with .groupby() and .agg():

import pandas as pd  
import numpy as np   
df = pd.DataFrame(dict(A=['coke', 'sprite', 'coke', 'sprite',  
                          'sprite', 'coke', 'coke'],  
                       B=['alpha','gamma', 'alpha', 'beta',  
                          'gamma', 'beta', 'beta'],  
                       col_1=[1,2,3,4,5,6,7],  
                       col_2=[1,6,2,4,7,9,3]))
tbl = df.groupby(['A','B']).agg({'col_1': ['max', np.mean],  
                                 'col_2': ['sum','min','count']})
# 'count' will always be the count for number of rows in each group.

And the result will look like this:

Both the rows and columns are multi-indexed. A quick solution to change it to a dataframe without multi-indices is

tbl = tbl.reset_index()  
tbl.columns = ['A', 'B', 'col_1_max', 'col_2_sum', 'col_2_min', 'count']

If you would like to have the column renaming process automated, you can do tbl.columns.get_level_values(0) and tbl.columns.get_level_values(1) to extract the indices in each level and combine them.

4. Column slicing

Some of you might be familiar with this already, but I still find it very useful when handling a dataframe with a ton of columns.

df.iloc[:,2:5].head()             # select the 2nd to the 4th column  
df.loc[:,'column_x':].head()     
# select all columns starting from 'column_x'

5. Add row ID / random row ID to each group

To add a row ID / random row ID for each group by A, B, one can first append an ID / random ID to all rows:

import numpy as np  
# df: target dataframe np.random.seed(0)   # set random seed  
df['random_ID_all'] = np.random.permutation(df.shape[0])  
df['ID_all'] = [i for i in range(1, df.shape[0]+1)]

To add a random ID to each group (by A, B), one can then do

df['ID'] = df.groupby(['A', 'B'])['ID_all'].rank(method='first',ascending=True).astype(int)
df['random_ID'] = df.groupby(['A', 'B'])'random_ID_all'].rank(method='first',ascending=True).astype(int)

to get

6. List all unique values in a group

Sometimes after we performed group by, we’d like to aggregate the values in the target column as a list of unique values instead of max, min, â€Ķetc. This is how it’s done.

df = pd.DataFrame(dict(A=['A','A','A','A','A','B','B','B','B'],  
                       B=[1,1,1,2,2,1,1,1,2],  
                       C=['CA','NY','CA','FL','FL',       
                          'WA','FL','NY','WA']))
tbl = df[['A', 'B', 'C']].drop_duplicates()\  
                         .groupby(['A','B'])['C']\  
                         .apply(list)\  
                         .reset_index() # list to string (separated by commas)   
tbl['C'] = tbl.apply(lambda x: (','.join([str(s) for s in x['C']])), axis = 1)

If you’d like to save the result, don’t forget to change the separator to anything other than commas.

7. Add row total and column total to a numerical dataframe

This is another common data manipulation. All you need is .apply().

df = pd.DataFrame(dict(A=[2,6,3],  
                       B=[2,2,6],   
                       C=[3,2,3]))
df['col_total']     = df.apply(lambda x: x.sum(), axis=1)  
df.loc['row_total'] = df.apply(lambda x: x.sum())

8. Check memory usage

`.memory_usage(deep=True)` can be used on Pandas dataframes to see the amount of memory used (in bytes) for each column. It’s useful when building machine learning models which may require a lot memory in training.

9. Cumulative sum

From time to time, cumulative sum is required when you generate some statistical outcomes. Simply do `

df['cumulative_sum'] = df['target_column'].cumsum()`  .

10. Crosstab

When you need to count the frequencies for groups formed by 3+ features, pd.crosstab() can make your life easier.

Thanks for reading! Comment below if you find bugs / better solutions.

B. Hide the index. Try df.head().style.hide_index()! C. Add hovering effects. (example revised from )

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.formats.style.Styler.set_table_styles.html
Source
https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html