Compute the number of unique entries in df
.
return len(df.column_name.unique())
Search for cells with specific value
df[df['column_name'] == 'name']
Calculate the total number of missing values (they are NaN
) in df
.
df.isnull().sum().sum()
Combine 2 arrays
res_3 = np.concatenate((res_1, res_2))
Correct misspelled names
df_cars['name'] = df_cars['name'].str.replace('chevroelt|chevrolet|chevy','chevrolet')
Replace NaN
value
df_cars.horsepower = df_cars.horsepower.str.replace('?','NaN').astype(float)
Fill missing value
meanhp = df_cars['horsepower'].mean()
df_cars['horsepower'] = df_cars['horsepower'].fillna(meanhp)
Create Dummy Variables
Values like ‘america’ cannot be read into an equation. So we create 3 simple true or false columns with titles equivalent to “Is this car America?”, “Is this care European?” and “Is this car Asian?”. These will be used as independent variables without imposing any kind of ordering between the three regions. Let’s apply the below code.
cData = pd.get_dummies(df_cars, columns=['origin'])
cData