📉
Tutorials
  • Computer History
  • Function
    • Finance
      • Calculate
    • Manage Data
    • Date&Time
    • Strings and Character
  • Snippets
    • Web Application
      • Hugo
      • JavaScript
        • Stopwatch using JavaScript?
    • Note
    • Start Project
      • GitHub
      • GitLab
    • Python Programming
      • Strings and Character Data
      • List
      • Dictionaries
    • Data Science
      • Setting Option
      • Get Data
  • Link Center
    • Next Articles
    • Google
    • Excel VBA
    • Python
      • Notebook
    • WebApp
      • Vue.js
    • Finance
    • Project
      • Kids
        • Scratch
      • Finance
        • Plotly.js
        • Portfolio
      • Mini Lab
        • Systems Administration
        • Auto Adjust Image
      • Sending Emails
      • ECS
        • Knowledge Base
        • ระบบผู้เชี่ยวชาญ (Expert System)
        • Check product
        • Compare two SQL databases
      • e-Library
        • Knowledge base
        • การจัดหมวดหมู่ห้องสมุด
        • Temp
      • AppSheet
        • บัญชีรายรับรายจ่าย
      • Weather App
      • COVID-19
  • Tutorials
    • Data Science
      • Data Science IPython notebooks
    • UX & UI
      • 7 กฎการออกแบบ UI
    • Web Scraping
      • Scrape Wikipedia Articles
      • Quick Start
    • GUI
      • pysimple
        • Create a GUI
      • Tkinter
        • Python Tkinter Tutorial
      • PyQt
        • PyQt Tutorial
    • MachineLearning
      • การพัฒนา Chat Bot
      • AI ผู้ช่วยใหม่ในการทำ Customer Segmentation
      • Customer Segmentation
      • ตัดคำภาษาไทย ด้วย PyThaiNLP API
    • Excel & VBA
      • INDEX กับ MATCH
      • รวมสูตร Excel ปี 2020
      • How to Write Code in a Spreadsheet
    • Visualization
      • Bokeh
        • Part I: Getting Started
        • Data visualization
        • Plotting a Line Graph
        • Panel Document
        • Interactive Data Visualization
    • VueJS
      • VueJS - Quick Guide
    • Django
      • Customize the Django Admin
      • พัฒนาเว็บด้วย Django
    • Git
      • วิธีสร้าง SSH Key
      • Git คืออะไร
      • เริ่มต้นใช้งาน Git
      • การใช้งาน Git และ Github
      • รวม 10 คำสั่ง Git
      • GIT Push and Pull
    • Finance
      • Stock Analysis using Pandas (Series)
      • Building Investment AI for fintech
      • Resampling Time Series
      • Python for Finance (Series)
      • Stock Data Analysis (Second Edition)
      • Get Stock Data Using Python
      • Stock Price Trend Analysis
      • Calculate Stock Returns
      • Quantitative Trading
      • Backtrader for Backtesting
      • Binance Python API
      • Pine Script (TradingView)
      • Stocks Analysis with Pandas and Scikit-Learn
      • Yahoo Finance API
      • Sentiment Analysis
      • yfinance Library
      • Stock Data Analysis
      • YAHOO_FIN
      • Algorithmic Trading
    • JavaScript
      • Split a number
      • Callback Function
      • The Best JavaScript Examples
      • File and FileReader
      • JavaScript Tutorial
      • Build Reusable HTML Components
      • Developing JavaScript components
      • JavaScript - Quick Guide
      • JavaScript Style Guide()
      • Beginner's Handbook
      • Date Now
    • Frontend
      • HTML
        • File Path
      • Static Site Generators.
        • Creating a New Theme
    • Flask
      • Flask - Quick Guide
      • Flask Dashboards
        • Black Dashboard
        • Light Blue
        • Flask Dashboard Argon
      • Create Flask App
        • Creating First Application
        • Rendering Pages Using Jinja
      • Jinja Templates
        • Primer on Jinja Templating
        • Jinja Template Document
      • Learning Flask
        • Ep.1 Your first Flask app
        • Ep.2 Flask application structure
        • Ep.3 Serving HTML files
        • Ep.4 Serving static files
        • Ep.5 Jinja template inheritance
        • Ep.6 Jinja template design
        • Ep.7 Working with forms in Flask
        • Ep.8 Generating dynamic URLs in Flask
        • Ep.9 Working with JSON data
        • Ep.23 Deploying Flask to a VM
        • Ep.24 Flask and Docker
        • Ep. 25: uWSGI Introduction
        • Ep. 26 Flask before and after request
        • Ep. 27 uWSGI Decorators
        • Ep. 28 uWSGI Decorators
        • Ep. 29 Flask MethodView
        • Ep. 30 Application factory pattern
      • The Flask Mega-Tutorial
        • Chapter 2: Templates
      • Building Flask Apps
      • Practical Flask tutorial series
      • Compiling SCSS to CSS
      • Flask application structure
    • Database
      • READING FROM DATABASES
      • SQLite
        • Data Management
        • Fast subsets of large datasets
      • Pickle Module
        • How to Persist Objects
      • Python SQL Libraries
        • Create Python apps using SQL Server
    • Python
      • Python vs JavaScript
      • Python Pillow – Adjust Image
      • Python Library for Google Search
      • Python 3 - Quick Guide
      • Regular Expressions
        • Python Regular Expressions
        • Regular Expression (RegEx)
        • Validate ZIP Codes
        • Regular Expression Tutorial
      • Python Turtle
      • Python Beginner's Handbook
      • From Beginner to Pro
      • Standard Library
      • Datetime Tutorial
        • Manipulate Times, Dates, and Time Spans
      • Work With a PDF
      • geeksforgeeks.org
        • Python Tutorial
      • Class
      • Modules
        • Modules List
        • pickle Module
      • Working With Files
        • Open, Read, Append, and Other File Handling
        • File Manipulation
        • Reading & Writing to text files
      • Virtual Environments
        • Virtual Environments made easy
        • Virtual Environmen
        • A Primer
        • for Beginners
      • Functions
        • Function Guide
        • Inner Functions
      • Learning Python
        • Pt. 4 Python Strings
        • Pt. 3 Python Variables
      • Zip Function
      • Iterators
      • Try and Except
        • Exceptions: Introduction
        • Exceptions Handling
        • try and excep
        • Errors and Exceptions
        • Errors & Exceptions
      • Control Flow
      • Lambda Functions
        • Lambda Expression คืออะไร
        • map() Function
      • Date and Time
        • Python datetime()
        • Get Current Date and Time
        • datetime in Python
      • Awesome Python
      • Dictionary
        • Dictionary Comprehension
        • ALL ABOUT DICTIONARIES
        • DefaultDict Type for Handling Missing Keys
        • The Definitive Guide
        • Why Functions Modify Lists and Dictionaries
      • Python Structures
      • Variable & Data Types
      • List
        • Lists Explained
        • List Comprehensions
          • Python List Comprehension
          • List Comprehensions in 5-minutes
          • List Comprehension
        • Python List
      • String
        • Strings and Character Data
        • Splitting, Concatenating, and Joining Strings
      • String Formatting
        • Improved String Formatting Syntax
        • String Formatting Best Practices
        • Remove Space
        • Add Spaces
      • Important basic syntax
      • List all the packages
      • comment
    • Pandas
      • Tutorial (GeeksforGeeks)
      • 10 minutes to pandas
      • Options and settings
      • เริ่มต้น Set Up Kaggle.com
      • Pandas - Quick Guide
      • Cookbook
      • NumPy
        • NumPy Package for Scientific
      • IO tools (text, CSV, …)
      • pandas.concat
      • Excel & Google Sheets
        • A Guide to Excel
        • Quickstart to the Google Sheets
        • Python Excel Tutorial: The Definitive Guide
      • Working With Text Data
        • Quickstart
      • API Reference
      • Groupby
      • DateTime Methods
      • DataFrame
      • sort_values()
      • Pundit: Accessing Data in DataFrames
      • datatable
        • DataFrame: to_json()
        • pydatatable
      • Read and Write Files
      • Data Analysis with Pandas
      • Pandas and Python: Top 10
      • 10 minutes to pandas
      • Getting Started with Pandas in Python
    • Markdown
      • Create Responsive HTML Emails
      • Using Markup Languages with Hugo
    • AngularJS
      • Learn AngularJS
    • CSS
      • The CSS Handbook
      • Box Shadow
      • Image Center
      • The CSS Handbook
      • The CSS Handbook
      • Loading Animation
      • CSS Grid Layout
      • Background Image Size
      • Flexbox
  • Series
    • จาวาสคริปต์เบื้องต้น
      • 1: รู้จักกับจาวาสคริปต์
  • Articles
    • Visualization
      • Dash
        • Introducing Dash
    • Finance
      • PyPortfolioOpt
      • Best Libraries for Finance
      • Detection of price support
      • Portfolio Optimization
      • Python Packages For Finance
    • Django
      • เริ่มต้น Django RestFramework
    • General
      • Heroku คืออะไร
      • How to Crack Passwords
    • Notebook
      • IPython Documentation
      • Importing Notebooks
      • Google Colab for Data Analytics
      • Creating Interactive Dashboards
      • The Definitive Guide
      • A gallery of interesting Jupyter Notebooks
      • Advanced Jupyter Notebooks
      • Converting HTML to Notebook
    • Pandas
      • Pandas_UI
      • Pandas Style API
      • Difference Between two Dataframes
      • 19 Essential Snippets in Pandas
      • Time Series Analysis
      • Selecting Columns in a DataFrame
      • Cleaning Up Currency Data
      • Combine Multiple Excel Worksheets
      • Stylin’ with Pandas
      • Pythonic Data Cleaning
      • Make Excel Faster
      • Reading Excel (xlsx) Files
      • How to use iloc and loc for Indexing
      • The Easiest Data Cleaning Method
    • Python
      • pip install package
      • Automating your daily tasks
      • Convert Speech to Text
      • Tutorial, Project Ideas, and Tips
      • Image Handling and Processing
        • Image Processing Part I
        • Image Processing Part II
        • Image tutorial
        • Image Processing with Numpy
        • Converts PIL Image to Numpy Array
      • Convert Dictionary To JSON
      • JSON Dump
      • Speech-to-Text Model
      • Convert Text to Speech
      • Tips & Tricks
        • Fundamentals for Data Science
        • Best Python Code Examples
        • Top 50 Tips & Tricks
        • 11 Beginner Tips
        • 10 Tips & Tricks
      • Password hashing
      • psutil
      • Lambda Expressions
    • Web Scraping
      • Web Scraping using Python
      • Build a Web Scraper
      • Web Scraping for beginner
      • Beautiful Soup
      • Scrape Websites
      • Python Web Scraping
        • Web Scraping Part 1
        • Web Scraping Part 2
        • Web Scraping Part 3
        • Web Scraping Part 4
      • Web Scraper
    • Frontend
      • Book Online with GitBook
      • Progressive Web App คืออะไร
      • self-host a Hugo web app
  • Examples
    • Django
      • Build a Portfolio App
      • SchoolManagement
    • Flask
      • Flask Stock Visualizer
      • Flask by Example
      • Building Flask Apps
      • Flask 101
    • OpenCV
      • Build a Celebrity Look-Alike
      • Face Detection-OpenCV
    • Python
      • Make Game FLASH CARD
      • Sending emails using Google
      • ตรวจหาภาพซ้ำด้วย Perceptual hashing
        • Sending Emails in Python
      • Deck of Cards
      • Extract Wikipedia Data
      • Convert Python File to EXE
      • Business Machine Learning
      • python-business-analytics
      • Simple Blackjack Game
      • Python Turtle Clock
      • Countdown
      • 3D Animation : Moon Phases
      • Defragmentation Algorithm
      • PDF File
        • จัดการข้อความ และรูป จากไฟล์ PDF ด้วย PDFBox
      • Reading and Generating QR codes
      • Generating Password
        • generate one-time password (OTP)
        • Random Password Generator
        • Generating Strong Password
      • PyQt: Building Calculator
      • List Files in a Directory
      • [Project] qID – โปรแกรมแต่งรูปง่ายๆ เพื่อการอัพลงเว็บ
      • Python and Google Docs to Build Books
      • Tools for Record Linking
      • Create Responsive HTML Email
      • psutil()
      • Transfer Learning for Deep Learning
      • ดึงข้อมูลคุณภาพอากาศประเทศไทย
        • Image Classification
    • Web Scraper
      • Scrape Wikipedia Articles
        • Untitled
      • How Scrape Websites with Python 3
    • Finance
      • Algorithmic Trading for Beginners
      • Parse TradingView Stock
      • Creating a stock price database with MariaDB and python
      • Source Code
        • stocks-list
      • Visualizing with D3
      • Real Time Stock in Excel using Python
      • Create Stock Quote Module
      • The Magic Formula Lost Its Sparkle?
      • Stock Market Analysis
      • Stock Portfolio Analyses Part 1
      • Stock Portfolio Analyses Part 2
      • Build A Dashboard In Python
      • Stock Market Predictions with LSTM
      • Trading example
      • Algorithmic Trading Strategies
      • DOWNLOAD FUNDAMENTALS DATA
      • Algorithmic Trading
      • numfin
      • Financial Machine Learning
      • Algorithm To Predict Stock Direction
      • Interactive Brokers API Code
      • The (Artificially) Intelligent Investor
      • Create Auto-Updating Excel of Stock Market
      • Stock Market Predictions
      • Automate Your Stock Portfolio
      • create an analytics dashboard
      • Bitcoin Price Notifications
      • Portfolio Management
    • WebApp
      • CSS
        • The Best CSS Examples
      • JavaScript
        • Memory Game
      • School Clock
      • Frontend Tutorials & Example
      • Side Menu Bar with sub-menu
      • Create Simple CPU Monitor App
      • Vue.js building a converter app
      • jQuery
        • The Best jQuery Examples
      • Image Slideshow
      • Handle Timezones
      • Text to Speech with Javascript
      • Building Blog for Your Portfolio
      • Responsive Website Layout
      • Maths Homework Generator
  • Books
    • Finance
      • Python for Finance (O'Reilly)
    • Website
      • Hugo
        • Go Bootcamp
        • Hugo in Action.
          • About this MEAP
          • Welcome
          • 1. The JAM stack with Hugo
          • 2. Live in 30 minutes
          • 3. Using Markup for content
          • 4. Content Management with Hugo
          • 5. Custom Pages and Customized Content
          • 6. Structuring web pages
          • A Appendix A.
          • B Appendix B.
          • C Appendix C.
    • Python
      • ภาษาไพธอนเบื้องต้น
      • Python Cheatsheet
        • Python Cheatsheet
      • Beginning Python
      • IPython Cookbook
      • The Quick Python Book
        • Case study
        • Part 1. Starting out
          • 1. About Python
          • 2. Getting started
          • 3. The Quick Python overview
        • Part 2. The essentials
          • 14. Exceptions
          • 13. Reading and writing files
          • 12. Using the filesystem
          • 11. Python programs
          • 10. Modules and scoping rules
          • 9. Functions
          • 8. Control flow
          • 4. The absolute basics
          • 5. Lists, tuples, and sets
          • 6. Strings
          • 7. Dictionaries
        • Part 3. Advanced language features
          • 19. Using Python libraries
          • 18. Packages
          • 17. Data types as objects
          • 16. Regular expressions
          • 15. Classes and OOP
        • Part 4. Working with data
          • Appendix B. Exercise answers
          • Appendix A. Python’s documentation
          • 24. Exploring data
          • 23. Saving data
          • 20. Basic file wrangling
          • 21. Processing data files
          • 22. Data over the network
      • The Hitchhiker’s Guide to Python
      • A Whirlwind Tour of Python
        • 9. Defining Functions
      • Automate the Boring Stuff
        • 4. Lists
        • 5. Dictionaries
        • 12. Web Scraping
        • 13. Excel
        • 14. Google Sheets
        • 15. PDF and Word
        • 16. CSV and JSON
    • IPython
    • Pandas
      • จัดการข้อมูลด้วย pandas เบื้องต้น
      • Pandas Tutorial
  • Link Center
    • Temp
  • เทควันโด
    • รวมเทคนิค
    • Help and Documentation
  • Image
    • Logistics
Powered by GitBook
On this page
  • 101 Python datatable Exercises (pydatatable)
  • 1. How to import datatable package and check the version?
  • 2. How to create a datatable Frame from a list, numpy array, pandas dataframe?
  • 3. How to import csv file as a pydatatable Frame?
  • 4. How to read first 5 rows of pydatatable Frame ?
  • 5. How to add new column in pydatatable Frame from a list?
  • 6. How to do addition of existing columns to get a new column in pydatatable Frame?
  • 7. How to get the int value of a float column in a pydatatable Frame?
  • 8. How to create a new column based on a condition in a datatable Frame?
  • 9. How to left join two datatable Frames?
  • 10. How to rename a column in a pydatatable Frame?
  • 11. How to import every 50th row from a csv file to create a datatable Frame?
  • 12. How to change column values when importing csv to a Python datatable Frame?
  • 13. How to change value at particular row and column in a Python datatable Frame?
  • 14. How to delete specific cell, row, column, row per condition in a datatable Frame?
  • 15. How to convert datatable Frame to pandas, numpy, dictionary, list, tuples, csv files?
  • 16. How to get data types of all the columns in the datatable Frame?
  • 17. How to get summary stats of each column in datatable Frame?
  • 18. How to get the column stats of particular column of the datatable Frame?
  • 19. How to apply group by functions in datatable Frame?
  • 20. How to arrange datatabe Frame in ascending order by column value?
  • 21. How to arrange datatabe Frame in descending order by column value?
  • 22. How to repeat(append) the same data in datatable Frame?
  • 23. How to replace string with another string in entire datatable Frame?
  • 24. How to extract the details of a particular cell with given criterion??
  • 25. How to rename a specific columns in a dataframe?
  • 26. How to count NA values in every column of a datatable Frame?
  • 27. How to get a specific column from a datatable Frame as a datatable Frame instead of a series?
  • 28. How to reverse the order of columns of a datatable Frame?
  • 29. How to format or suppress scientific notations in Python datatable Frame?
  • 30. How to filter every nth row in a pydatatable?
  • 31. How to reverse the rows of a python datatable Frame?
  • 32. How to find out which column contains the highest number of row-wise maximum values?
  • 33. How to normalize all columns in a dataframe?
  • 34. How to compute grouped mean on datatable Frame and keep the grouped column as another column?
  • 35. How to join two datatable Frames by 2 columns?
  • 36. How to create leads (column shifted up by 1 row) of a column in a datatable Frame?
  • Machine Learning Exercise
  • 36. How to use FTRL Model to calculate the probability of a person having diabetes?

Was this helpful?

  1. Tutorials
  2. Pandas
  3. datatable

pydatatable

by Selva Prabhakaran | Posted on August 31, 2019

PreviousDataFrame: to_json()NextRead and Write Files

Last updated 5 years ago

Was this helpful?

101 Python datatable Exercises (pydatatable)

Python datatable is the newest package for data manipulation and analysis in Python. It carries the spirit of R’s data.table with similar syntax. It is super fast, much faster than pandas and has the ability to work with out-of-memory data. Looking at the it is on path to become a must-use package for data manipulation in python.101 Python datatable Exercises (pydatatable). Photo by Jet Kim.

1. How to import datatable package and check the version?

Difficulty Level: L1

import datatable as dt
dt.__version__

# '0.8.0'

You need to import datatable as dt for the rest of the codes in this exercise to work.

2. How to create a datatable Frame from a list, numpy array, pandas dataframe?

Difficulty Level: L1

Question: Create a datatable Frame from a list, numpy array and pandas dataframe.

Input:

import pandas as pd
import numpy as np

my_list = list('abcedfghijklmnopqrstuvwxyz')
my_arr = np.arange(26)
my_df = pd.DataFrame(dict(col1=my_list, col2=my_arr))

Desired Output:

import pandas as pd
import numpy as np
import datatable as dt

# Inputs
my_list = list('abcedfghijklmnopqrstuvwxyz')
my_arr  = np.arange(26)
my_df   = pd.DataFrame(dict(col1=my_list, col2=my_arr))


# Solution
dt_df1  = dt.Frame(my_list)
dt_df2  = dt.Frame(my_arr)
dt_df3  = dt.Frame(my_df)
dt_df4  = dt.Frame(A=my_arr, B= my_list)

3. How to import csv file as a pydatatable Frame?

Difficulty Level: L1

Question: Read files as datatable Frame.Show Solution

# Solution
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
df.head(5)

4. How to read first 5 rows of pydatatable Frame ?

Difficulty Level: L1

Question: Read first 5 rows of datatable Frame.

# Solution
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', max_nrows= 5)
df

5. How to add new column in pydatatable Frame from a list?

Difficulty Level: L1

Question: Read first 5 rows of datatable Frame and add a new column of length 5.

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', max_nrows= 5)

# Solution
df[:,"new_column"] = dt.Frame([1,2,3,4,5])
df

6. How to do addition of existing columns to get a new column in pydatatable Frame?

Difficulty Level: L1

Question: Add age and rad columns to get a new column in datatable Frame.

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

# Solution
df[:,"new_column"] = df[:, dt.f.age + dt.f.rad]

7. How to get the int value of a float column in a pydatatable Frame?

Difficulty Level: L1

Question: Get the int value of a float column dis in datatable Frame.

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

# Solution
df[:, "new_column"] = df[:, dt.int32(dt.f.dis)]
df.head(5)

8. How to create a new column based on a condition in a datatable Frame?

Difficulty Level: L2

Question: Create a new column having value as ‘Old’ if age greater than 60 else ‘New’ in a `datatable` Frame.

import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
df[:, "new_column"] = dt.Frame(np.where(df[:, dt.f.age > 60], 'Old', 'New'))
df.head(5)

9. How to left join two datatable Frames?

Difficulty Level: L1

Question: join two Frames.

Input:

import datatable as dt
df1 = dt.Frame(A=[1,2,3,4],B=["a", "b", "c", "d"])
df2 = dt.Frame(A=[1,2,3,4,5],C=["a2", "b2", "c2", "d2", "e2"])

Primary Key : AShow Solution

import datatable as dt
df1 = dt.Frame(A=[1,2,3,4],B=["a", "b", "c", "d"])
df2 = dt.Frame(A=[1,2,3,4,5],C=["a2", "b2", "c2", "d2", "e2"])
df2.key = "A"
output = df1[:, :, dt.join(df2)]
output

10. How to rename a column in a pydatatable Frame?

Difficulty Level: L1

Question: Rename column zn to zn_new in a datatable Frame.

import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
df.names = {'zn': 'zn_new'}
df.head(5)

11. How to import every 50th row from a csv file to create a datatable Frame?

Difficiulty Level: L2

Question: Import every 50th row of [BostonHousing dataset] (BostonHousing.csv) as a dataframe.

# Solution: Use csv reader. Unfortunately there isn't an option to do it directly using fread()
import datatable as dt
import csv          
with open('local/path/to/BostonHousing.csv', 'r') as f:
    reader = csv.reader(f)
    for i, row in enumerate(reader):
        row = [[x] for x in row]
        # 1st row
        if i == 0:  
            df = dt.Frame(row)
            header = [x[0] for x in df[0,:].to_list()]
            df.names =  header
            del df[0,:]  
        # Every 50th row
        elif i%50 ==0:
            df_temp = dt.Frame(row)
            df_temp.names = header
            df.rbind(df_temp)

df.head(5)

12. How to change column values when importing csv to a Python datatable Frame?

Difficulty Level: L2

# Solution: Use csv reader
import datatable as dt
import csv          
with open('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', 'r') as f:
    reader = csv.reader(f)
    for i, row in enumerate(reader):
        row = [[x] for x in row]
        if i == 0:
            df = dt.Frame(row)
            header = [x[0] for x in df[0,:].to_list()]
            df.names =  header
            del df[0,:]  
        else:
            row[13] = ['High'] if float(row[13][0]) > 25 else ['Low']
            df_temp = dt.Frame(row)
            df_temp.names = header
            df.rbind(df_temp)

df.head(5)

13. How to change value at particular row and column in a Python datatable Frame?

Difficulty Level: L1

Question: Change value at row number 2 and column number 1 as 5 in a datatable Frame.

# Solution: It follows row, column indexing. No need to use "loc", ".loc"
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
df[2,1] = 5
df.head(5)

14. How to delete specific cell, row, column, row per condition in a datatable Frame?

Difficulty Level: 2

Questions:

  1. Delete the cell at position 2,1.

  2. Delete the 3rd row.

  3. Delete the chas column.

  4. Delete rows where column zn is having 0 value.

# Solution: It follows row,colume indexing. No need to use "loc", ".loc"
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

# Delete the cell at position `2,1`.
del df[2,1]

# Delete the `3rd` row.
del df[3,:]

# Delete the `chas` column.
del df[:,"chas"]

# Delete rows where column `zn` is having 0 value.
del df[dt.f.zn == 0,:]

df.head(5)

15. How to convert datatable Frame to pandas, numpy, dictionary, list, tuples, csv files?

Difficulty Level: L1

Question: Convert datatable Frame to pandas, numpy, dictionary, list, tuples, csv files.

# Solution
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

# to pandas df
pd_df = df.to_pandas()

# to numpy arrays
np_arrays = df.to_numpy()

# to dictionary
dic = df.to_dict()

# to list
list_ = df[:,"indus"].to_list()

# to tuple
tuples_ = df[:,"indus"].to_tuples()

# to csv 
df.to_csv("BostonHousing.csv")

16. How to get data types of all the columns in the datatable Frame?

Difficulty Level: L1

Question: Get data types of all the columns in the datatable Frame.

Desired Output:

crim : stype.float64
zn : stype.float64
indus : stype.float64
chas : stype.bool8
nox : stype.float64
rm : stype.float64
age : stype.float64
dis : stype.float64
rad : stype.int32
tax : stype.int32
ptratio : stype.float64
b : stype.float64
lstat : stype.float64
medv : stype.float64

Show Solution

17. How to get summary stats of each column in datatable Frame?

Difficulty Level: L1

Questions:

For each column:

  1. Get the sum of the column values.

  2. Get the max of the column values.

  3. Get the min of the column values.

  4. Get the mean of the column values.

  5. Get the standard deviation of the column values.

  6. Get the mode of the column values.

  7. Get the modal value of the column values.

  8. Get the number of unique values in column.

# Solution
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
df.sum()
df.max()
df.min()
df.mean()
df.sd()
df.mode()
df.nmodal()
df.nunique()

18. How to get the column stats of particular column of the datatable Frame?

Difficulty Level: L1

Question: Get the max value of zn column of the datatable Frame

Desired Output: 100

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
df[:,dt.max(dt.f.zn)]

19. How to apply group by functions in datatable Frame?

Difficulty Level: L1

Desired Output:

     Manufacturer         C0
0            None  28.550000
1           Acura  15.900000
2            Audi  33.400000
3             BMW  30.000000
4           Buick  21.625000
5        Cadillac  37.400000
..
..

30     Volkswagen  18.025000
31          Volvo  22.700000
# Solution
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')
df[:, dt.mean(dt.f.Price), dt.by("Manufacturer")].head(5)

20. How to arrange datatabe Frame in ascending order by column value?

Difficulty Level: L1

Question: Arrange datatable Frame in ascending order by Price.

Desired Output:

Manufacturer    Model     Type  Min.Price  Price  Max.Price  MPG.city  \ 
0       Saturn       SL    Small        9.2    NaN       12.9       NaN   
1       Toyota    Camry  Midsize       15.2    NaN       21.2      22.0   
2         Ford  Festiva    Small        6.9    7.4        7.9      31.0   
3      Hyundai    Excel    Small        6.8    8.0        9.2      29.0   
4        Mazda      323    Small        7.4    8.3        9.1      29.0   


   Width  Turn.circle Rear.seat.room  Luggage.room  Weight   Origin  \
0   68.0         40.0           26.5           NaN  2495.0      USA   
1   70.0         38.0           28.5          15.0  3030.0  non-USA   
2   63.0         33.0           26.0          12.0  1845.0      USA   
3   63.0         35.0           26.0          11.0  2345.0  non-USA   
4   66.0         34.0           27.0          16.0  2325.0  non-USA   

            Make  
0      Saturn SL  
1   Toyota Camry  
2   Ford Festiva  
3  Hyundai Excel  
4      Mazda 323  
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution1
df.sort("Price")

# Solution2
df[:,:, dt.sort(dt.f.Price)].head(5)

21. How to arrange datatabe Frame in descending order by column value?

Difficulty Level: L1

Question: Arrange datatable Frame in descending order by Price.

Desired Output:

   Manufacturer     Model     Type  Min.Price  Price  Max.Price  MPG.city  \
0  Mercedes-Benz      300E  Midsize       43.8   61.9       80.0      19.0   
1       Infiniti       Q45  Midsize       45.4   47.9        NaN      17.0   
2       Cadillac   Seville  Midsize       37.5   40.1       42.7      16.0   
3      Chevrolet  Corvette   Sporty       34.6   38.0       41.5      17.0   
4           Audi       100  Midsize        NaN   37.7       44.6      19.0   

   MPG.highway             AirBags DriveTrain  ... Passengers  Length  \
0         25.0  Driver & Passenger       Rear  ...        5.0     NaN   
1         22.0                None       Rear  ...        5.0   200.0   
2         25.0  Driver & Passenger      Front  ...        5.0   204.0   
3         25.0         Driver only       Rear  ...        2.0   179.0   
4         26.0  Driver & Passenger       None  ...        6.0   193.0   

   Wheelbase  Width  Turn.circle Rear.seat.room  Luggage.room  Weight  \
0      110.0   69.0         37.0            NaN          15.0  3525.0   
1      113.0   72.0         42.0           29.0          15.0  4000.0   
2      111.0   74.0         44.0           31.0           NaN  3935.0   
3       96.0   74.0         43.0            NaN           NaN  3380.0   
4      106.0    NaN         37.0           31.0          17.0  3405.0   

    Origin                Make  
0  non-USA  Mercedes-Benz 300E  
1  non-USA        Infiniti Q45  
2      USA    Cadillac Seville  
3     None  Chevrolet Corvette  
4  non-USA            Audi 100  
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
df[::-1,:, dt.sort(dt.f.Price)].head()

22. How to repeat(append) the same data in datatable Frame?

Difficulty Level: L1

Question: Repeat(append) the same data 5 times in datatable Frame.

23. How to replace string with another string in entire datatable Frame?

Difficulty Level: L1

Question: Replace Audi with My Dream Car in entire datatable Frame.

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
dt.repeat(df, 5)

24. How to extract the details of a particular cell with given criterion??

Difficulty Level: L1

Question: Extract which manufacturer, model and type has the highest Price.

Desired Output:

 Manufacturer  Model     Type
 Mercedes-Benz  300E  Midsize

Show Solution

25. How to rename a specific columns in a dataframe?

Difficulty Level: L2

Question: Rename the column Model as Car Model.

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
old_col_name = "Model"
new_col_name = "Car Model"
df.names = [new_col_name if x == old_col_name else x for x in df.names]
df.head(5)

26. How to count NA values in every column of a datatable Frame?

Difficulty Level: L1

Question: Count NA values in every column of a datatable Frame.

Desired Output:

Manufacturer  Model  Type  Min.Price  Price  Max.Price  MPG.city  \
0             4      1     3          7      2          5         9   

   MPG.highway  AirBags  DriveTrain  ...  Passengers  Length  Wheelbase  \
0            2        6           7  ...           2       4          1   

   Width  Turn.circle  Rear.seat.room  Luggage.room  Weight  Origin  Make  
0      6            5               4            19       7       5     3

Show Solution

27. How to get a specific column from a datatable Frame as a datatable Frame instead of a series?

Difficulty Level: L1

Question :Get the column (Model) in datatable Frame as a datatable Frame (rather than as a Series).

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
df[:,"Model"].head(5)

28. How to reverse the order of columns of a datatable Frame?

Difficulty Level: L1

Question : Reverse the order of columns in Cars93 datatable Frame.

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution 1
df.head()
df[:,::-1].head(5)

29. How to format or suppress scientific notations in Python datatable Frame?

Difficulty Level: L2

Question: Suppress scientific notations like ‘e-03’ in df and print upto 6 numbers after decimal.

Input

import datatable as dt
df = dt.Frame(random=np.random.random(4)**10)
df
 #        random
0  3.518290e-04
1  5.104371e-02
2  5.895886e-06
3  1.274671e-09

Desired Output

         random   random2
0  3.518290e-04  0.000352
1  5.104371e-02  0.051044
2  5.895886e-06  0.000006
3  1.274671e-09  0.000000
# Solution
import datatable as dt
df = dt.Frame(random=np.random.random(4)**10)
df[:,"random2"] = dt.Frame(['%.6f' % x for x in df[:,"random"].to_list()[0]])
df

30. How to filter every nth row in a pydatatable?

Difficulty Level: L1

Question: From df, filter the 'Manufacturer', 'Model' and 'Type' for every 20th row starting from 1st (row 0).

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
df[::20, ['Manufacturer', 'Model', 'Type']]

31. How to reverse the rows of a python datatable Frame?

Difficulty Level: L2

Question: Reverse all the rows.

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
df[::-1,:]

32. How to find out which column contains the highest number of row-wise maximum values?

Difficulty Level: L2

Question: What is the column name with the highest number of row-wise maximum’s.

Desired Output: taxShow Solution

# Input
import datatable as dt
df = dt.fread("https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv")

# Solution
for i in range(len(df.names)):
    if df.sum()[0:1,:].to_list()[i] == max(df.sum()[0:1,:].to_list()):
        print(df.names[i])

33. How to normalize all columns in a dataframe?

Difficulty Level: L2

Questions:

  1. Normalize all columns of df by subtracting the column mean and divide by standard deviation.

  2. Range all columns of df such that the minimum value in each column is 0 and max is 1.

Don’t use external packages like sklearn.

Desired Output:

       crim    zn     indus  chas       nox        rm       age       dis  \
0  0.000000  0.18  0.067815   0.0  0.314815  0.577505  0.641607  0.269203   
1  0.000236  0.00  0.242302   0.0  0.172840  0.547998  0.782698  0.348962   
2  0.000236  0.00  0.242302   0.0  0.172840  0.694386  0.599382  0.348962   
3  0.000293  0.00  0.063050   0.0  0.150206  0.658555  0.441813  0.448545   
4  0.000705  0.00  0.063050   0.0  0.150206  0.687105  0.528321  0.448545   

        rad       tax   ptratio         b     lstat      medv  
0  0.000000  0.208015  0.287234  1.000000  0.089680  0.422222  
1  0.043478  0.104962  0.553191  1.000000  0.204470  0.368889  
2  0.043478  0.104962  0.553191  0.989737  0.063466  0.660000  
3  0.086957  0.066794  0.648936  0.994276  0.033389  0.631111  
4  0.086957  0.066794  0.648936  1.000000  0.099338  0.693333
# Input
import datatable as dt
df = dt.fread("BostonHousing.csv")

# Solution
for i in df.names:
    df[:,i] = df[:,(dt.f[i] - df[:,dt.min(dt.f[i])][0,0])/(df[:,dt.max(dt.f[i])][0,0] - df[:,dt.min(dt.f[i])][0,0])]
df.head(5)

34. How to compute grouped mean on datatable Frame and keep the grouped column as another column?

Difficulty Level: L1

Question: In df, Compute the mean price of every fruit, while keeping the fruit as another column instead of an index.

Input

df = dt.Frame(fruit = ['apple', 'banana', 'orange'] * 3,
             rating =  np.random.rand(9),
             price  =  np.random.randint(0, 15, 9))

Desired Output:

    fruit        C0
0   apple  7.666667
1  banana  5.000000
2  orange  8.333333

Show Solution

35. How to join two datatable Frames by 2 columns?

Difficulty Level: L2

Question: Join dataframes df1 and df2 by ‘A’ and ‘B’.

Input

df1 = dt.Frame(A=[1, 2, 3, 4],
               B=["a", "b", "c", "d"],
               D=[1, 2, 3, 4])

df2 = dt.Frame(A=[1, 2, 4, 5],
               B=["a", "b", "d", "e"],
               C=["a2", "b2", "d2", "e2"])

Desired Output:

   A  B  D   C
0  1  a  1  a2
1  2  b  2  b2
2  3  c  3  
3  4  d  4  d2

Show Solution

36. How to create leads (column shifted up by 1 row) of a column in a datatable Frame?

Difficulty Level: L2

Question: Create new column in df, which is a lead1 (shift column A up by 1 row).

Input:

df = dt.Frame(A=[1,2,3,4],B=["a", "b", "c", "d"],d=[1,2,3,4])

Desired Output:

   A  B  d  A.1
0  1  a  1    2
1  2  b  2    3
2  3  c  3    4
3  4  d  4  NaN
# Input
import datatable as dt
df1 = dt.Frame(A=[1, 2, 3, 4], B=["a", "b", "c", "d"], D=[1, 2, 3, 4])
df2 = dt.Frame(A=[1, 2, 4, 5], B=["a", "b", "d", "e"], C=["a2", "b2", "d2", "e2"])

# Solution
df2.key = ["A","B"]
output = df1[:, :, dt.join(df2)]
output

Machine Learning Exercise

36. How to use FTRL Model to calculate the probability of a person having diabetes?

Difficulty Level: L3

Question 1: Use Follow the Regularized Leader (Ftrl) Model to calculate the probability of a person having diabetes.

Question 2: Find the feature importance of the features used in model.

Input:

import datatable as dt
from datatable.models import Ftrl

# Import data
train_df = dt.fread('pima_indian_diabetes_training_data.csv')
test_df = dt.fread('pima_indian_diabetes_testing_data.csv')

# Create Ftrl model
ftrl_model = Ftrl()

#  add parameter values while creating model
ftrl_model = Ftrl(alpha = 0.1, lambda1 = 0.5, lambda2 = 0.6)

# change paramter of existing model
ftrl_model.alpha = 0.1
ftrl_model.lambda1 = 0.5
ftrl_model.lambda2 = 0.6

# Prepare training and test dataset
train_df[:,"diabetes"] = dt.Frame(np.where(train_df[:, dt.f["diabetes"] == "pos"], 1,0))
test_df[:,"diabetes"] = dt.Frame(np.where(test_df[:, dt.f["diabetes"] == "pos"], 1,0))

x_train = train_df[:, ["pregnant", "glucose", "pressure", "mass", "pedigree", "age"]]
y_train = train_df[:, ["diabetes"]]

x_test = test_df[:, ["pregnant", "glucose", "pressure", "mass", "pedigree", "age"]]
y_test = test_df[:, ["diabetes"]]

# training the model
ftrl_model.fit(x_train,y_train)

# predictions of the model
targets = ftrl_model.predict(x_test)
print(targets.head(5))

# feature importance
fi = ftrl_model.feature_importances
fi

Author: Ajay Kumar

Input:

Input URL for CSV file:

Input URL for CSV file:

Input:

Input: Show Solution

Input: Show Solution

Input:

Input:

Question: Import the , but while importing change the 'medv' (median house value) column so that values < 25 becomes ‘Low’ and > 25 becomes ‘High’.

Input: Show Solution

Input:

Input:

Input:

Input:

Input:

Input:

Question: Find the mean price for every manufacturer using dataset.

Input:

Input:

Input:

Input: Show Solution

Input: Show Solution

Input:

Input: Show Solution

Input:

Input: Show Solution

Input: Show Solution

Input: Show Solution

Input:

Input:

Training Data :

Testing Data : Show Solution

Reference :

BostonHousing dataset
https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv
https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv
BostonHousing dataset
BostonHousing dataset
BostonHousing dataset
BostonHousing dataset
BostonHousing dataset
boston housing dataset
BostonHousing dataset
BostonHousing dataset
BostonHousing dataset
BostonHousing dataset
BostonHousing dataset
BostonHousing dataset
BostonHousing dataset
Cars93
Cars93
Cars93
Cars93
Cars93
Cars93
Cars93
Cars93
Cars93
Cars93
Cars93
Cars93
BostonHousing dataset
BostonHousing dataset
pima_indian_diabetes_training_data.csv
pima_indian_diabetes_testing_data.csv
https://www.machinelearningplus.com/data-manipulation/101-python-datatable-exercises-pydatatable/
performance