Here is what is covered in this section:
Recoding variables is sometimes necessary if you want to create new variable groups, or convert categorical to numeric, or vise versa. To complete this task, one must use a function. If you are rusty on functions, refresh yourself here.
Data used for Examples
The data set used on this page was downloaded from Kaggle.com from the user Miroslav Sabo. To download, go to our GitHub page (https://github.com/Opensourcefordatascience/Data-sets), or get it from Kaggle (https://www.kaggle.com/miroslavsabo/young-people-survey).
Note: The file from our GitHub page is modified from the original .csv file. In our version, a “Participant Number” column has been added. This column is arbitrarily assigned.
Recoding using Functions
In our data set, there is a variable named “Village – town” and contains categorical data values of “city” and “village”.
|Village – town|
However, we want the values to reflect the variable name, i.e. have data values of “village” and “town”. To do this, we will create a function and then apply it to that variable to recode all values that are “city” to “town”. Here, we will introduce the .apply() function.
In order to have the recoding stick, we must assign the variable to itself. If that’s unclear, here it is in action.
|Village – town|
As you can see, now all the values are either “city” or “town”.
Creating a New Variable with Recoding
You can easily create a new variable that contains recoded values of another variable. To do this, you follow all the steps above with one exception. Instead of assigning the recoded variable to itself, you assign it to a new variable.
In this example, we will take the age variable that contains numeric values and create a new variable that contains categorical values that will represent age groupings.