Hello everyone! It is starting to heat up over here in California, and it's a great time to hunker down and do some cleaning - data cleaning!
What is that, you may ask? Cleaning is how we get our dataset organized and ready for analysis. It's one of those things that can seem a bit tedious at times, but is super-important to get right. We are asking several questions about stress and the work environment. To get clear answers to these questions, a well-organized, easy-to-read database is essential.
Most of the cleaning I have to do is because the online survey software likes to assign its own codes and labels to different data points, and they aren't very easy to pull up and read. So I go in to adjust a lot of those issues, but without changing the actual responses from participants.
Some of my mini-projects include:
- addressing responses that were entered in duplicate, or are not relevant to the study questions
- handling missing data (there are a few strategies researchers use for this, depending on the situation)
- standardizing the way certain information, like salary, is reported. For example, two people may have the same salary, but enter it differently - my job is to convert salary information to the same format so I can report on those numbers.
- renaming variable names and labels for easier analysis. For example, "PSS1" is a lot easier to read than "Q14.1_Q14.1_1".
I have started to run some preliminary numbers, but nothing is official until the database is in tip-top shape. Truth be told, I do enjoy the cleaning process, but the real fun starts soon!
Hi, I'm Stephania! I investigate job-related stress in peer support providers, and write about my work here.