Series and Indexes are equipped with a set of string processing methods that make it easy to operate on each element of the array. Perhaps most importantly, these methods exclude missing/NA values automatically. These are accessed via the str attribute and generally, have names matching the equivalent (scalar) built-in string methods.
In order to lowercase a data, we use str.lower() this function converts all uppercase characters to lowercase. If no uppercase characters exist, it returns the original string. In order to uppercase a data, we use str.upper() this function converts all lowercase characters to uppercase. If no lowercase characters exist, it returns the original string.
Code #1:
# Import pandas package import pandas as pd # Define a dictionary containing employee data data ={'Name':['Jai','Princi','Gaurav','Anuj'],'Age':[27,24,22,32],'Address':['Delhi','Kanpur','Allahabad','Kannauj'],'Qualification':['Msc','MA','MCA','Phd']}# Convert the dictionary into DataFrame df = pd.DataFrame(data)# converting and overwriting values in column df["Name"]= df["Name"].str.lower()print(df)
In this example, we are using nba.csv file.
Code #2:
# importing pandas package import pandas as pd # making data frame from csv file data = pd.read_csv("nba.csv")# converting and overwriting values in column data["Team"]= data["Team"].str.upper()# display data
Output :
As shown in the output image of data frame, all values in the Team column have been converted into upper case.
Splitting and Replacing a Data
In order to split a data, we use str.split() this function returns a list of strings after breaking the given string by the specified separator but it can only be applied to an individual string. Pandas str.split() method can be applied to a whole series. .str has to be prefixed every time before calling this method to differentiate it from the Python’s default function otherwise, it will throw an error. In order to replace a data, we use str.replace() this function works like Python.replace() method only, but it works on Series too. Before calling .replace() on a Pandas series, .str has to be prefixed in order to differentiate it from the Python’s default replace method.
Code #1
# importing pandas module import pandas as pd # Define a dictionary containing employee data data ={'Name':['Jai','Princi','Gaurav','Anuj'],'Age':[27,24,22,32],'Address':['Nagpur','Kanpur','Allahabad','Knnuaj'],'Qualification':['Msc','MA','MCA','Phd']}# Convert the dictionary into DataFrame df = pd.DataFrame(data)# dropping null value columns to avoid errors df.dropna(inplace =True)# new data frame with split value columns df["Address"]= df["Address"].str.split("a", n =1, expand =True)# df display print(df)
Code #2:
# importing pandas module import pandas as pd# reading csv file from urldata = pd.read_csv("nba.csv")# overwriting column with replaced value of agedata["Age"]= data["Age"].replace(25.0, "Twenty five")# creating a filter for age column # where age = "Twenty five"filter= data["Age"]=="Twenty five"# printing only filtered columnsdata.where(filter).dropna()
Output :
As shown in the output image, all the values in Age column having age=25.0 have been replaced by “Twenty five”.
Concatenation of Data
In order to concatenate a Series or Index, we use str.cat() this function is used to concatenate strings to the passed caller series of string. Distinct values from a different series can be passed but the length of both the series has to be same. .str has to be prefixed to differentiate it from the Python’s default method.
Code #1:
# importing pandas module import pandas as pd # Define a dictionary containing employee data data ={'Name':['Jai','Princi','Gaurav','Anuj'],'Age':[27,24,22,32],'Address':['Nagpur','Kanpur','Allahabad','Kannuaj'],'Qualification':['Msc','MA','MCA','Phd']}# Convert the dictionary into DataFrame df = pd.DataFrame(data)# making copy of address column new = df["Address"].copy()# concatenating address with name column # overwriting name column df["Name"]= df["Name"].str.cat(new, sep =", ")# display print(df)
# importing pandas moduleimport pandas as pd# importing csv from linkdata = pd.read_csv("nba.csv")# making copy of team columnnew = data["Team"].copy()# concatenating team with name column# overwriting name columndata["Name"]= data["Name"].str.cat(new, sep =", ")# displaydata
Output:
As shown in the output image, every string in the Team column having same index as string in Name column have been concatenated with separator “, “.
Removing Whitespaces of Data
In order to remove a whitespaces, we use str.strip(), str.rstrip(), str.lstrip() these function used to handle white spaces(including New line) in any text data. As it can be seen in the name, str.lstrip() is used to remove spaces from the left side of string, str.rstrip() to remove spaces from right side of the string and str.strip() removes spaces from both sides. Since these are pandas function with same name as Python’s default functions, .str has to be prefixed to tell the compiler that a Pandas function is being called.
Code #1:
# importing pandas module import pandas as pd # Define a dictionary containing employee data data ={'Name':['Jai','Princi','Gaurav','Anuj'],'Age':[27,24,22,32],'Address':['Nagpur junction','Kanpur junction','Nagpur junction','Kannuaj junction'],'Qualification':['Msc','MA','MCA','Phd']}# Convert the dictionary into DataFrame df = pd.DataFrame(data)# replacing address name and adding spaces in start and end new = df["Address"].replace("Nagpur junction", " Nagpur junction ").copy()# checking with custom string print(new.str.strip()==" Nagpur junction")print(new.str.strip()=="Nagpur junction ")print(new.str.strip()==" Nagpur junction ")
# importing pandas module import pandas as pd # making data frame data = pd.read_csv("nba.csv")# replacing team name and adding spaces in start and end new = data["Team"].replace("Boston Celtics", " Boston Celtics ").copy()# checking with custom removed space string new.str.lstrip()=="Boston Celtics "
Output :
As shown in the output image of the data frame, all values in the name column have been converted into lower case.
Output :
As shown in the output image, the Address column was separated at the first occurrence of “a” and not on the later occurrence since the n parameter was set to 1 (Max 1 separation in a string).
Output :
As shown in the output image, every string in the Address column having same index as string in Name column have been concatenated with separator “, “.
Code #2:
Output :
As shown in the output image, the comparison is returning False for all 3 conditions, which means the spaces were successfully removed from both sides and the string is no longer having spaces.
Code #2:
Output :
As shown in the output image, the comparison is true after removing the left side spaces