Notes|Cheatsheet for Basic Python Pandas Library

Compute the number of unique entries in `df`.

return len(df.column_name.unique())

Search for cells with specific value

df[df['column_name'] == 'name']

Calculate the total number of missing values (they are `NaN`) in `df`.

df.isnull().sum().sum()

Combine 2 arrays

res_3 = np.concatenate((res_1, res_2))

Correct misspelled names

df_cars['name'] = df_cars['name'].str.replace('chevroelt|chevrolet|chevy','chevrolet')

Replace `NaN` value

df_cars.horsepower = df_cars.horsepower.str.replace('?','NaN').astype(float)

Fill missing value

meanhp = df_cars['horsepower'].mean()
df_cars['horsepower'] = df_cars['horsepower'].fillna(meanhp)

Create Dummy Variables

Values like ‘america’ cannot be read into an equation. So we create 3 simple true or false columns with titles equivalent to “Is this car America?”, “Is this care European?” and “Is this car Asian?”. These will be used as independent variables without imposing any kind of ordering between the three regions. Let’s apply the below code.

cData = pd.get_dummies(df_cars, columns=['origin'])
cData

Compute the number of unique entries in df.#

Search for cells with specific value#

Calculate the total number of missing values (they are NaN) in df.#

Combine 2 arrays#

Correct misspelled names#

Replace NaN value#

Fill missing value#

Create Dummy Variables#