📉
Tutorials
  • Computer History
  • Function
    • Finance
      • Calculate
    • Manage Data
    • Date&Time
    • Strings and Character
  • Snippets
    • Web Application
      • Hugo
      • JavaScript
        • Stopwatch using JavaScript?
    • Note
    • Start Project
      • GitHub
      • GitLab
    • Python Programming
      • Strings and Character Data
      • List
      • Dictionaries
    • Data Science
      • Setting Option
      • Get Data
  • Link Center
    • Next Articles
    • Google
    • Excel VBA
    • Python
      • Notebook
    • WebApp
      • Vue.js
    • Finance
    • Project
      • Kids
        • Scratch
      • Finance
        • Plotly.js
        • Portfolio
      • Mini Lab
        • Systems Administration
        • Auto Adjust Image
      • Sending Emails
      • ECS
        • Knowledge Base
        • ระบบผู้เชี่ยวชาญ (Expert System)
        • Check product
        • Compare two SQL databases
      • e-Library
        • Knowledge base
        • การจัดหมวดหมู่ห้องสมุด
        • Temp
      • AppSheet
        • บัญชีรายรับรายจ่าย
      • Weather App
      • COVID-19
  • Tutorials
    • Data Science
      • Data Science IPython notebooks
    • UX & UI
      • 7 กฎการออกแบบ UI
    • Web Scraping
      • Scrape Wikipedia Articles
      • Quick Start
    • GUI
      • pysimple
        • Create a GUI
      • Tkinter
        • Python Tkinter Tutorial
      • PyQt
        • PyQt Tutorial
    • MachineLearning
      • การพัฒนา Chat Bot
      • AI ผู้ช่วยใหม่ในการทำ Customer Segmentation
      • Customer Segmentation
      • ตัดคำภาษาไทย ด้วย PyThaiNLP API
    • Excel & VBA
      • INDEX กับ MATCH
      • รวมสูตร Excel ปี 2020
      • How to Write Code in a Spreadsheet
    • Visualization
      • Bokeh
        • Part I: Getting Started
        • Data visualization
        • Plotting a Line Graph
        • Panel Document
        • Interactive Data Visualization
    • VueJS
      • VueJS - Quick Guide
    • Django
      • Customize the Django Admin
      • พัฒนาเว็บด้วย Django
    • Git
      • วิธีสร้าง SSH Key
      • Git คืออะไร
      • เริ่มต้นใช้งาน Git
      • การใช้งาน Git และ Github
      • รวม 10 คำสั่ง Git
      • GIT Push and Pull
    • Finance
      • Stock Analysis using Pandas (Series)
      • Building Investment AI for fintech
      • Resampling Time Series
      • Python for Finance (Series)
      • Stock Data Analysis (Second Edition)
      • Get Stock Data Using Python
      • Stock Price Trend Analysis
      • Calculate Stock Returns
      • Quantitative Trading
      • Backtrader for Backtesting
      • Binance Python API
      • Pine Script (TradingView)
      • Stocks Analysis with Pandas and Scikit-Learn
      • Yahoo Finance API
      • Sentiment Analysis
      • yfinance Library
      • Stock Data Analysis
      • YAHOO_FIN
      • Algorithmic Trading
    • JavaScript
      • Split a number
      • Callback Function
      • The Best JavaScript Examples
      • File and FileReader
      • JavaScript Tutorial
      • Build Reusable HTML Components
      • Developing JavaScript components
      • JavaScript - Quick Guide
      • JavaScript Style Guide()
      • Beginner's Handbook
      • Date Now
    • Frontend
      • HTML
        • File Path
      • Static Site Generators.
        • Creating a New Theme
    • Flask
      • Flask - Quick Guide
      • Flask Dashboards
        • Black Dashboard
        • Light Blue
        • Flask Dashboard Argon
      • Create Flask App
        • Creating First Application
        • Rendering Pages Using Jinja
      • Jinja Templates
        • Primer on Jinja Templating
        • Jinja Template Document
      • Learning Flask
        • Ep.1 Your first Flask app
        • Ep.2 Flask application structure
        • Ep.3 Serving HTML files
        • Ep.4 Serving static files
        • Ep.5 Jinja template inheritance
        • Ep.6 Jinja template design
        • Ep.7 Working with forms in Flask
        • Ep.8 Generating dynamic URLs in Flask
        • Ep.9 Working with JSON data
        • Ep.23 Deploying Flask to a VM
        • Ep.24 Flask and Docker
        • Ep. 25: uWSGI Introduction
        • Ep. 26 Flask before and after request
        • Ep. 27 uWSGI Decorators
        • Ep. 28 uWSGI Decorators
        • Ep. 29 Flask MethodView
        • Ep. 30 Application factory pattern
      • The Flask Mega-Tutorial
        • Chapter 2: Templates
      • Building Flask Apps
      • Practical Flask tutorial series
      • Compiling SCSS to CSS
      • Flask application structure
    • Database
      • READING FROM DATABASES
      • SQLite
        • Data Management
        • Fast subsets of large datasets
      • Pickle Module
        • How to Persist Objects
      • Python SQL Libraries
        • Create Python apps using SQL Server
    • Python
      • Python vs JavaScript
      • Python Pillow – Adjust Image
      • Python Library for Google Search
      • Python 3 - Quick Guide
      • Regular Expressions
        • Python Regular Expressions
        • Regular Expression (RegEx)
        • Validate ZIP Codes
        • Regular Expression Tutorial
      • Python Turtle
      • Python Beginner's Handbook
      • From Beginner to Pro
      • Standard Library
      • Datetime Tutorial
        • Manipulate Times, Dates, and Time Spans
      • Work With a PDF
      • geeksforgeeks.org
        • Python Tutorial
      • Class
      • Modules
        • Modules List
        • pickle Module
      • Working With Files
        • Open, Read, Append, and Other File Handling
        • File Manipulation
        • Reading & Writing to text files
      • Virtual Environments
        • Virtual Environments made easy
        • Virtual Environmen
        • A Primer
        • for Beginners
      • Functions
        • Function Guide
        • Inner Functions
      • Learning Python
        • Pt. 4 Python Strings
        • Pt. 3 Python Variables
      • Zip Function
      • Iterators
      • Try and Except
        • Exceptions: Introduction
        • Exceptions Handling
        • try and excep
        • Errors and Exceptions
        • Errors & Exceptions
      • Control Flow
      • Lambda Functions
        • Lambda Expression คืออะไร
        • map() Function
      • Date and Time
        • Python datetime()
        • Get Current Date and Time
        • datetime in Python
      • Awesome Python
      • Dictionary
        • Dictionary Comprehension
        • ALL ABOUT DICTIONARIES
        • DefaultDict Type for Handling Missing Keys
        • The Definitive Guide
        • Why Functions Modify Lists and Dictionaries
      • Python Structures
      • Variable & Data Types
      • List
        • Lists Explained
        • List Comprehensions
          • Python List Comprehension
          • List Comprehensions in 5-minutes
          • List Comprehension
        • Python List
      • String
        • Strings and Character Data
        • Splitting, Concatenating, and Joining Strings
      • String Formatting
        • Improved String Formatting Syntax
        • String Formatting Best Practices
        • Remove Space
        • Add Spaces
      • Important basic syntax
      • List all the packages
      • comment
    • Pandas
      • Tutorial (GeeksforGeeks)
      • 10 minutes to pandas
      • Options and settings
      • เริ่มต้น Set Up Kaggle.com
      • Pandas - Quick Guide
      • Cookbook
      • NumPy
        • NumPy Package for Scientific
      • IO tools (text, CSV, …)
      • pandas.concat
      • Excel & Google Sheets
        • A Guide to Excel
        • Quickstart to the Google Sheets
        • Python Excel Tutorial: The Definitive Guide
      • Working With Text Data
        • Quickstart
      • API Reference
      • Groupby
      • DateTime Methods
      • DataFrame
      • sort_values()
      • Pundit: Accessing Data in DataFrames
      • datatable
        • DataFrame: to_json()
        • pydatatable
      • Read and Write Files
      • Data Analysis with Pandas
      • Pandas and Python: Top 10
      • 10 minutes to pandas
      • Getting Started with Pandas in Python
    • Markdown
      • Create Responsive HTML Emails
      • Using Markup Languages with Hugo
    • AngularJS
      • Learn AngularJS
    • CSS
      • The CSS Handbook
      • Box Shadow
      • Image Center
      • The CSS Handbook
      • The CSS Handbook
      • Loading Animation
      • CSS Grid Layout
      • Background Image Size
      • Flexbox
  • Series
    • จาวาสคริปต์เบื้องต้น
      • 1: รู้จักกับจาวาสคริปต์
  • Articles
    • Visualization
      • Dash
        • Introducing Dash
    • Finance
      • PyPortfolioOpt
      • Best Libraries for Finance
      • Detection of price support
      • Portfolio Optimization
      • Python Packages For Finance
    • Django
      • เริ่มต้น Django RestFramework
    • General
      • Heroku คืออะไร
      • How to Crack Passwords
    • Notebook
      • IPython Documentation
      • Importing Notebooks
      • Google Colab for Data Analytics
      • Creating Interactive Dashboards
      • The Definitive Guide
      • A gallery of interesting Jupyter Notebooks
      • Advanced Jupyter Notebooks
      • Converting HTML to Notebook
    • Pandas
      • Pandas_UI
      • Pandas Style API
      • Difference Between two Dataframes
      • 19 Essential Snippets in Pandas
      • Time Series Analysis
      • Selecting Columns in a DataFrame
      • Cleaning Up Currency Data
      • Combine Multiple Excel Worksheets
      • Stylin’ with Pandas
      • Pythonic Data Cleaning
      • Make Excel Faster
      • Reading Excel (xlsx) Files
      • How to use iloc and loc for Indexing
      • The Easiest Data Cleaning Method
    • Python
      • pip install package
      • Automating your daily tasks
      • Convert Speech to Text
      • Tutorial, Project Ideas, and Tips
      • Image Handling and Processing
        • Image Processing Part I
        • Image Processing Part II
        • Image tutorial
        • Image Processing with Numpy
        • Converts PIL Image to Numpy Array
      • Convert Dictionary To JSON
      • JSON Dump
      • Speech-to-Text Model
      • Convert Text to Speech
      • Tips & Tricks
        • Fundamentals for Data Science
        • Best Python Code Examples
        • Top 50 Tips & Tricks
        • 11 Beginner Tips
        • 10 Tips & Tricks
      • Password hashing
      • psutil
      • Lambda Expressions
    • Web Scraping
      • Web Scraping using Python
      • Build a Web Scraper
      • Web Scraping for beginner
      • Beautiful Soup
      • Scrape Websites
      • Python Web Scraping
        • Web Scraping Part 1
        • Web Scraping Part 2
        • Web Scraping Part 3
        • Web Scraping Part 4
      • Web Scraper
    • Frontend
      • Book Online with GitBook
      • Progressive Web App คืออะไร
      • self-host a Hugo web app
  • Examples
    • Django
      • Build a Portfolio App
      • SchoolManagement
    • Flask
      • Flask Stock Visualizer
      • Flask by Example
      • Building Flask Apps
      • Flask 101
    • OpenCV
      • Build a Celebrity Look-Alike
      • Face Detection-OpenCV
    • Python
      • Make Game FLASH CARD
      • Sending emails using Google
      • ตรวจหาภาพซ้ำด้วย Perceptual hashing
        • Sending Emails in Python
      • Deck of Cards
      • Extract Wikipedia Data
      • Convert Python File to EXE
      • Business Machine Learning
      • python-business-analytics
      • Simple Blackjack Game
      • Python Turtle Clock
      • Countdown
      • 3D Animation : Moon Phases
      • Defragmentation Algorithm
      • PDF File
        • จัดการข้อความ และรูป จากไฟล์ PDF ด้วย PDFBox
      • Reading and Generating QR codes
      • Generating Password
        • generate one-time password (OTP)
        • Random Password Generator
        • Generating Strong Password
      • PyQt: Building Calculator
      • List Files in a Directory
      • [Project] qID – โปรแกรมแต่งรูปง่ายๆ เพื่อการอัพลงเว็บ
      • Python and Google Docs to Build Books
      • Tools for Record Linking
      • Create Responsive HTML Email
      • psutil()
      • Transfer Learning for Deep Learning
      • ดึงข้อมูลคุณภาพอากาศประเทศไทย
        • Image Classification
    • Web Scraper
      • Scrape Wikipedia Articles
        • Untitled
      • How Scrape Websites with Python 3
    • Finance
      • Algorithmic Trading for Beginners
      • Parse TradingView Stock
      • Creating a stock price database with MariaDB and python
      • Source Code
        • stocks-list
      • Visualizing with D3
      • Real Time Stock in Excel using Python
      • Create Stock Quote Module
      • The Magic Formula Lost Its Sparkle?
      • Stock Market Analysis
      • Stock Portfolio Analyses Part 1
      • Stock Portfolio Analyses Part 2
      • Build A Dashboard In Python
      • Stock Market Predictions with LSTM
      • Trading example
      • Algorithmic Trading Strategies
      • DOWNLOAD FUNDAMENTALS DATA
      • Algorithmic Trading
      • numfin
      • Financial Machine Learning
      • Algorithm To Predict Stock Direction
      • Interactive Brokers API Code
      • The (Artificially) Intelligent Investor
      • Create Auto-Updating Excel of Stock Market
      • Stock Market Predictions
      • Automate Your Stock Portfolio
      • create an analytics dashboard
      • Bitcoin Price Notifications
      • Portfolio Management
    • WebApp
      • CSS
        • The Best CSS Examples
      • JavaScript
        • Memory Game
      • School Clock
      • Frontend Tutorials & Example
      • Side Menu Bar with sub-menu
      • Create Simple CPU Monitor App
      • Vue.js building a converter app
      • jQuery
        • The Best jQuery Examples
      • Image Slideshow
      • Handle Timezones
      • Text to Speech with Javascript
      • Building Blog for Your Portfolio
      • Responsive Website Layout
      • Maths Homework Generator
  • Books
    • Finance
      • Python for Finance (O'Reilly)
    • Website
      • Hugo
        • Go Bootcamp
        • Hugo in Action.
          • About this MEAP
          • Welcome
          • 1. The JAM stack with Hugo
          • 2. Live in 30 minutes
          • 3. Using Markup for content
          • 4. Content Management with Hugo
          • 5. Custom Pages and Customized Content
          • 6. Structuring web pages
          • A Appendix A.
          • B Appendix B.
          • C Appendix C.
    • Python
      • ภาษาไพธอนเบื้องต้น
      • Python Cheatsheet
        • Python Cheatsheet
      • Beginning Python
      • IPython Cookbook
      • The Quick Python Book
        • Case study
        • Part 1. Starting out
          • 1. About Python
          • 2. Getting started
          • 3. The Quick Python overview
        • Part 2. The essentials
          • 14. Exceptions
          • 13. Reading and writing files
          • 12. Using the filesystem
          • 11. Python programs
          • 10. Modules and scoping rules
          • 9. Functions
          • 8. Control flow
          • 4. The absolute basics
          • 5. Lists, tuples, and sets
          • 6. Strings
          • 7. Dictionaries
        • Part 3. Advanced language features
          • 19. Using Python libraries
          • 18. Packages
          • 17. Data types as objects
          • 16. Regular expressions
          • 15. Classes and OOP
        • Part 4. Working with data
          • Appendix B. Exercise answers
          • Appendix A. Python’s documentation
          • 24. Exploring data
          • 23. Saving data
          • 20. Basic file wrangling
          • 21. Processing data files
          • 22. Data over the network
      • The Hitchhiker’s Guide to Python
      • A Whirlwind Tour of Python
        • 9. Defining Functions
      • Automate the Boring Stuff
        • 4. Lists
        • 5. Dictionaries
        • 12. Web Scraping
        • 13. Excel
        • 14. Google Sheets
        • 15. PDF and Word
        • 16. CSV and JSON
    • IPython
    • Pandas
      • จัดการข้อมูลด้วย pandas เบื้องต้น
      • Pandas Tutorial
  • Link Center
    • Temp
  • เทควันโด
    • รวมเทคนิค
    • Help and Documentation
  • Image
    • Logistics
Powered by GitBook
On this page
  • Tips for Selecting Columns in a DataFrame
  • Introduction
  • Why Do We Care About Selecting Columns?
  • The Data
  • Using iloc
  • iloc and boolean arrays
  • Caveats
  • Summary
  • Changes

Was this helpful?

  1. Articles
  2. Pandas

Selecting Columns in a DataFrame

Tue 26 November 2019

PreviousTime Series AnalysisNextCleaning Up Currency Data

Last updated 5 years ago

Was this helpful?

Posted by in

article header image

Introduction

This article will discuss several tips and shortcuts for using iloc to work with a data set that has a large number of columns. Even if you have some experience with using iloc you should learn a couple of helpful tricks to speed up your own analysis and avoid typing lots of column names in your code.

Why Do We Care About Selecting Columns?

In many standard data science examples, there are a relatively small number of columns. For example, Titanic has 8, Iris has 4, and Boston Housing has 14. Real-life data sets are messy and often include a lot of extra (potentially unnecessary) columns.

In data science problems you may need to select a subset of columns for one or more of the following reasons:

  • Filtering the data to only include the relevant columns can help shrink the memory footprint and speed up data processing.

  • Limiting the number of columns can reduce the mental overhead of keeping the data model in your head.

  • When exploring a new data set, it might be necessary to break to task into manageable chunks.

  • In some cases, you may need to loop through columns and perform calculations or cleanups in order to get the data in the format you need for further analysis.

  • Your data may just contain extra or duplicate information which is not needed.

Regardless of the reason, you may not need these techniques all the time. When you do, though, the tricks outlined below can reduce the amount of time you spend wrangling columns of data.

The Data

Let’s get started by reading in the data.

import pandas as pd
import numpy as np

df = pd.read_csv(
    'https://data.cityofnewyork.us/api/views/vfnx-vebw/rows.csv?accessType=DOWNLOAD&bom=true&format=true'
)

Sometimes it gets tricky to remember each column name and where it is by index. Here is a simple list comprehension to build up a reference list of all columns and their index.

col_mapping = [f"{c[0]}:{c[1]}" for c in enumerate(df.columns)]

Which creates a list like this:

['0:X',
'1:Y',
'2:Unique Squirrel ID',
'3:Hectare',
'4:Shift',
'5:Date',
 ...
'33:Borough Boundaries',
'34:City Council Districts',
'35:Police Precincts']

In some cases if you might want to rename a bunch of columns, you can use a dictionary comprehension to create a dictionary view of the data:

col_mapping_dict = {c[0]:c[1] for c in enumerate(df.columns)}

Which creates this dictionary:

{0: 'X',
1: 'Y',
2: 'Unique Squirrel ID',
3: 'Hectare',
4: 'Shift',
5: 'Date',
...
33: 'Borough Boundaries',
34: 'City Council Districts',
35: 'Police Precincts'}

Having these variables defined can be useful as you progress through your analysis. Instead of repeatedly looking at your original file, you can just double check the variable name during your analysis.

One other common task I frequently have is to rename a bunch of columns that are inconsistently named across files. I use a dictionary to easily rename all the columns using something like df.rename(columns=col_mapping) Typing all the column names can be an error prone task. A simple trick is to copy all the columns in excel and use pd.read_clipboard() to build a small DataFrame and turn the columns into a dictionary. I can then manually type in the new names, if need be.

Here is a quick example with this data set. Note that we pass in the sep to parse a tab delimited string:

df_cols = pd.read_clipboard(sep='\t')
col_mapping = {c[1]:'' for c in enumerate(df_cols.columns)}

Which creates a dictionary that is relatively easy to populate with new names:

{'X': '',
'Y': '',
'Unique': '',
'Squirrel': '',
'ID': '',
'Hectare': '',
'Shift': '',
...
'Police': '',
'Precincts': ''}

As an added bonus, you could even use an Excel file to set up the column renaming and automate the whole process. That tip is outside the scope of this article. If you are interested though, let me know in the comments.

Using iloc

For instance, if you want to look at just the Squirrel ID column of data for all rows:

df.iloc[:, 2]
0       37F-PM-1014-03
1       37E-PM-1006-03
2        2E-AM-1010-03
3        5D-PM-1018-05
4       39B-AM-1018-01
             ...
3018    30B-AM-1007-04
3019    19A-PM-1013-05
3020    22D-PM-1012-07
3021    29B-PM-1010-02
3022     5E-PM-1012-01
Name: Unique Squirrel ID, Length: 3023, dtype: object

If you want to look at the X and Y location as well as the ID, you can pass in a list of integers [0,1,2] :

df.iloc[:, [0,1,2]]

X

Y

Unique Squirrel ID

0

-73.956134

40.794082

37F-PM-1014-03

1

-73.957044

40.794851

37E-PM-1006-03

2

-73.976831

40.766718

2E-AM-1010-03

3

-73.975725

40.769703

5D-PM-1018-05

4

-73.959313

40.797533

39B-AM-1018-01

…

…

…

…

3018

-73.963943

40.790868

30B-AM-1007-04

3019

-73.970402

40.782560

19A-PM-1013-05

3020

-73.966587

40.783678

22D-PM-1012-07

3021

-73.963994

40.789915

29B-PM-1010-02

3022

-73.975479

40.769640

5E-PM-1012-01

3023 rows × 3 columns

Typing all the columns is not the most efficient, so we can use slicing notation to make this a little easier to understand:

df.iloc[:, 0:3]

Which will generate the same output as above.

If you have some experience with python lists, and have used pandas a bit; all of this usage should make sense. These points are pandas 101 concepts but we will build up from here.

While both of these approaches are straightforward, what if you want to combine the list of integers with the slice notation? You might try something like this:

df.iloc[:, [0:3,15:19]]
  File "<ipython-input-56-6f5b3426f412>", line 1
    df.iloc[:, [0:3,15:19]]
                 ^
SyntaxError: invalid syntax

Or, you could try something like this:

df.iloc[:, 0:3,15:19]
IndexingError: Too many indexers

Hmmm. That obviously doesn’t work but seems like it would be useful for selecting ranges as well as individual columns.

Here’s a slightly more elaborate example to show how it works on a combination of individual list items and sliced ranges:

np.r_[0:3,15:19,24,25]
array([ 0,  1,  2, 15, 16, 17, 18, 24, 25])

That’s kind of cool. This object has converted the combination of integer lists and slice notation into a single list which we can pass to iloc :

df.iloc[:, np.r_[0:3,15:19,24,25]]

Y

Unique Squirrel ID

Date

Hectare Squirrel Number

Age

Running

Chasing

Climbing

Eating

Foraging

Other Activities

Kuks

Quaas

Moans

Tail flags

Tail twitches

Approaches

Indifferent

Runs from

Other Interactions

0

40.794082

37F-PM-1014-03

10142018

3

NaN

False

False

False

False

False

NaN

False

False

False

False

False

False

False

False

NaN

1

40.794851

37E-PM-1006-03

10062018

3

Adult

True

False

False

False

False

NaN

False

False

False

False

False

False

False

True

me

2

40.766718

2E-AM-1010-03

10102018

3

Adult

False

False

True

False

False

NaN

False

False

False

False

False

False

True

False

NaN

3

40.769703

5D-PM-1018-05

10182018

5

Juvenile

False

False

True

False

False

NaN

False

False

False

False

False

False

False

True

NaN

…

…

…

…

…

…

…

…

…

…

…

…

…

…

…

…

…

…

…

…

…

3019

40.782560

19A-PM-1013-05

10132018

5

Adult

False

False

False

False

True

NaN

False

False

False

False

False

False

True

False

NaN

3020

40.783678

22D-PM-1012-07

10122018

7

Adult

False

False

False

True

True

NaN

False

False

False

False

False

False

True

False

NaN

3023 rows × 20 columns

Here is another tip. You may also use this notation when reading data using read_csv :

df_2 = pd.read_csv(
    'https://data.cityofnewyork.us/api/views/vfnx-vebw/rows.csv?accessType=DOWNLOAD&bom=true&format=true',
    usecols=np.r_[1,2,5:8,15:30],
)

I find this notation helpful when you have a data set where you want to keep non-sequential columns and do not want to type out the full names of all the columns.

One caveat I would make is that you need to be careful when using the slice notation and keep in mind that the last number in the range will not be included in the generated list of numbers.

For example, if we specify the range 2:4 , we only get a list of 2 and 3:

np.r_[2:4]
array([2, 3])

If you want to include column index 4, use np.r_[2:5] .

One final comment on np.r_ is that there is an optional step argument. In this example, we can specify that this list will increment by 2:

np.r_[2:10:2]
array([2, 4, 6, 8])

This is a bit of a more advanced option and is not going to be intuitively obvious to a new pandas user. However if you ever find yourself parsing a lot of columns by index, this might be a useful tool to navigate a tricky scenario.

iloc and boolean arrays

One of the most powerful ways to filter columns is to pass a boolean array to iloc to select a subset of columns. This sounds a little complex but a couple of examples should make this understandable.

The most important concept is that we don’t generate a boolean array by hand but use the output from another pandas function to generate the array and feed it to iloc .

In this case, we can use the str accessor on a column index just like any other column of pandas data. This will generate the necessary boolean array that iloc expects. An example should help make this clear.

If we want to see which columns contain the word “run”:

run_cols = df.columns.str.contains('run', case=False)
print(run_cols)
array([False, False, False, False, False, False, False, False, False,
    False, False, False, False, False, False,  True, False, False,
    False, False, False, False, False, False, False, False, False,
    False,  True, False, False, False, False, False, False, False])

Then we can pass this new array of boolean values to select only two columns:

df.iloc[:, run_cols].head()

Running

Runs from

0

False

False

1

True

True

2

False

False

3

False

True

4

False

False

In practice, many people will use a lambda function to do this in one line:

df.iloc[:, lambda df:df.columns.str.contains('run', case=False)]

The benefits of using str functions are that you can get sophisticated with the potential filter options. For instance, if we want all the columns with “district,” “precinct” or “boundaries” in the name:

df.iloc[:, lambda df: df.columns.str.contains('district|precinct|boundaries',
                                              case=False)].head()

Community Districts

Borough Boundaries

City Council Districts

Police Precincts

0

19

4

19

13

1

19

4

19

13

2

19

4

19

13

3

19

4

19

13

4

19

4

19

13

We can even combine all these concepts together by using the results of the boolean array to get the index then use np.r_ to combine these lists together.

Here is an example where we want to get all the location related columns as well as the squirrel ID into a DataFrame:

location_cols = df.columns.str.contains('district|precinct|boundaries',
                                        case=False)
location_indices = [i for i, col in enumerate(location_cols) if col]
df.iloc[:, np.r_[0:3,location_indices]].head()

X

Y

Unique Squirrel ID

Community Districts

Borough Boundaries

City Council Districts

Police Precincts

0

-73.956134

40.794082

37F-PM-1014-03

19

4

19

13

1

-73.957044

40.794851

37E-PM-1006-03

19

4

19

13

2

-73.976831

40.766718

2E-AM-1010-03

19

4

19

13

3

-73.975725

40.769703

5D-PM-1018-05

19

4

19

13

4

-73.959313

40.797533

39B-AM-1018-01

19

4

19

13

This code is a little complicated since we are using a conditional list comprehension and might be overkill for selecting 7 columns. The important concept is that you know it is possible and can refer back to this article when you need it for your own analysis.

Caveats

One item to keep in mind when dealing with numerical indexing of columns is that you need to understand where your data comes from. If you expect you ID column to always be in a specific location and it changes order in the data, you could face problems with your subsequent data processing. This situation is where your domain knowledge and expertise comes into play to make sure the solution is robust enough for the given situation.

Summary

Most of my data analysis involves filtering and selecting data at the row level. However there are times when it is helpful to work with data in a column-wise fashion. Pandas iloc can be a useful tool for quickly and efficiently working with data sets that have many columns of data. I hope this article provided a couple of tips that will help you with your own analysis.

Changes

  • 1-Dec-2019: Updated typos and clarified read_clipboard usage to use tab delimiter.

Also, if you like this type of content, I encourage you to check out Kevin Markham’s which served as an inspiration for a couple of the tips below.

In order to illustrate some examples, I’m going to use a quirky data set from the . Yes, apparently there was an effort to count and catalog squirrels in Central Park. I thought this would be a fun example to work through. It also gave me a chance to include a squirrel image (credit: ) in my post :) .

This data includes 3,023 rows of data and 31 columns. While 31 columns is not a tremendous number of columns, it is a useful example to illustrate the concepts you might apply to data with many more columns.

If you want to follow along, you can view the or pull it directly from .

The primary we will walk through is panda’s iloc which is used for integer-location based indexing. New users may be slightly confused because iloc and loc can take a boolean-array which leads to more powerful indexing. Since both functions can take a boolean array as input, there are times when these functions produce the same output. However, for the scope of this post, I will focus only on iloc column selection.

Here’s a simple graphic to illustrate the primary usage for iloc :

Fortunately there is a that can help us out. The r_ object will “Translate slice objects to concatenation along the first axis.” It might not make much sense from the documentation but it does exactly what we need.

Reference :

pandas tricks
Central Park Squirrel Census
GeorgeB2
set
notebook
github
function
numpy object
https://pbpython.com/selecting-columns.html
Tips for Selecting Columns in a DataFrame
Chris Moffitt
articles
iloc illustrated