Wednesday, January 14, 2015

What does missing value means in SPSS?

Katakanlah kita diberi tugasan menyampaikan tazkirah kepada 10 orang budak-budak muda. Untuk merekodkan tugasan itu, kita catatkan berapa minit tazkirah yang kita sampaikan kepada mereka. Contohnya, Untuk Mat A, kita beri 5 minit, untuk Minah C kita beri 7 minit (panjang sikit, sebab banyak cerita pasal k-pop).
Bila kita semak rekod, tetiba perasa untuk sorang mamat ni, kita rekodkan angka 0. Adakah ini bermaksud kita lupa nak beri tazkirah kat dia, ataupun kita memang beri kurang daripada 1 minit tazkirah? Untuk mengelakkan kekeliruan macam ini, disarankan kita letakkan angka yang memang tak logik untuk menandakan bahawa kita tak beri tazkirah kat dia. Contohnya, kita guna angka 99. Memang tak pernah dalam sejarah hidup kita beri tazkirah sampai 99 minit secara one-to-one.
Dalam kes ini, maka angka 99 itu adalah 'missing value'; satu istilah yang digunakan dalam SPSS. Bukannya nilai tu hilang, tapi ia menandakan nilai yang tiada supaya kita jelas nilai tu memang tak ada; bukan sebab kita lupa nak rekodkan, terfana atau terkhilaf sebab banyak lagha dengar K-pop.

In SPSS, when we have declared 99 as a missing value for a variable, then it cannot be considered as an extreme value or an outlier.
For example, in a survey, the variable sex is given two possible values: 1= male, 2=female. Lets say someone forgot to answer this question or is not sure what sex they are. So, rather than leaving a blank in the SPSS file, we insert 99 to indicate that the respondent did not give any answer. Most other respondents gave either 1 or 2, and these are very different from 99. But we cannot say 99 is an extreme score or an outlier because it is NOT an actual score for the respondent. Compare this to the another scenario.
In another survey, respondents were asked to write their age. All of the respondents were between 20 and 25 except for one person who is one year short of being a centenarian (i.e. 99 years old). Don't ask me why that person was included in the survey. This is a fictional story to make a point. So, lets just pretend that this 99 year old participant had responded to the survey. Now, when we key in 99 for that person's age, we are not saying that he did not wrote his age (i.e. 99 is NOT a missing value). In fact, 99 IS that person's actual age (or score, if you understand it that way). In this case, we can say that 99 is an extreme value compared to the rest of the values (which are between 20 and 25). It can justified as an outlier score.

No comments: