Dealing with missing values while data pre-processing in Python

Data pre-processing is crucial step in any data processing task. Eighty percentage of the total work in data-related analysis and learning is data munging i.e to bring data into form that makes sense for further analysis.
Python has excellent libraries known as modules which are made for data cleaning and pre-processing. We are going to look into the famous sklearn library for this tutorial. Let’s take an example of a relatively small dataset that contains data about average salary of people from some countries:
S.N Country Hours Salary House
0 France 34.0 12000.0 No
1 Spain 37.0 49000.0 Yes
2 Germany 20.0 34000.0 No
3 Spain 58.0 41000.0 No
4 Germany 40.0 NaN Yes
5 France 45.0 28000.0 Yes
6 Spain NaN 51000.0 No
7 France 28.0 89000.0 Yes
8 Germany 50.0 53000.0 No
9 France 47.0 33000.0 Yes
We can see we have a missing values in Hours worked and Salary columns. One approach to fix this is to remove the rows that contains missing data. But this is not good approach because if we have lots of missing values we are going to lose lots of data. We are going to replace the missing values with the mean of values of the column which makes more sense and we don’t have to lose data.
Let’s start by importing some common libraries required for data cleaning:
Now let’s create the imputer object and define how it should replace missing values.
The result of above is
S.N Country Hours Salary House
0 France 34.0 12000.0 No
1 Spain 37.0 49000.0 Yes
2 Germany 20.0 34000.0 No
3 Spain 58.0 41000.0 No
4 Germany 40.0 43333.3 Yes
5 France 45.0 28000.0 Yes
6 Spain 39.8 51000.0 No
7 France 28.0 89000.0 Yes
8 Germany 50.0 53000.0 No
9 France 47.0 33000.0 Yes
Some of the strategies you can use are: mean, median and most_frequent. The last one is especially useful when you are dealing with non-numerical columns.
When you print the value now you can see the missing values are now replaced with the mean.
This is how missing value is taken care in python. If you liked the article don’t forget to clap and follow me.