Using Survey Weight
In R working with survey weight is made possible using survey
package. Let’s use below data frame as an exmaple here:
set.seed(1000)
age <- c(18:100)
age <- sample(age, 100, replace = TRUE)
gender <- c("Male", "Female")
gender <- sample(gender, 100, replace = TRUE)
country <- c("A", "B")
country <- sample(country, 100, replace = TRUE)
df <- data.frame(age, gender, country)
df$weight[df$gender=="Female"] <-50/sum(df$gender=="Female")
df$weight[df$gender=="Male"] <-50/sum(df$gender=="Male")
# summary of data
summary(df)
## age gender country weight
## Min. :18.00 Length:100 Length:100 Min. :0.8929
## 1st Qu.:38.75 Class :character Class :character 1st Qu.:0.8929
## Median :54.50 Mode :character Mode :character Median :0.8929
## Mean :55.79 Mean :1.0000
## 3rd Qu.:73.25 3rd Qu.:1.1364
## Max. :97.00 Max. :1.1364
The most important variable here is the weight variable which is constructed to balance the sex ratio.
Inside the survey
package, there is svydesign()
function that can be used to link a data frame with a weight.
# install.packages("survey")
library(survey)
df.w <- svydesign(ids = ~1, data = df, weights = ~weight)
The resulting object is not a data frame anymore, but is a list of different objects that can be seen using attributes()
function.
attributes(df.w)
## $names
## [1] "cluster" "strata" "has.strata" "prob" "allprob"
## [6] "call" "variables" "fpc" "pps"
##
## $class
## [1] "survey.design2" "survey.design"
Therefore, we need to use survey
’s own analytical functions. For example, here is a comparison of unweighted and weighted sex ratio.
# unweighted
df %>%
{table(.$gender)} %>%
prop.table()
##
## Female Male
## 0.44 0.56
# weighted
df.w %>%
svytable(~gender, .) %>%
prop.table()
## gender
## Female Male
## 0.5 0.5
svytable()
can be used to create more than one-way frequency/percentage tables as well. For example, let’s create contingency table of gender
and country
df.w %>%
svytable(~gender+country, .) %>%
prop.table(2)
## country
## gender A B
## Female 0.5600000 0.4329897
## Male 0.4400000 0.5670103
Below are other useful functions of survey
package:
# to compute weighted mean
svymean(~age, df.w)
# to compute weighted quantiles
svyquantile(~age, df.w, c(.25, .50, .75))
# to compute weigted variance
svyvar(~age, df.w)
# to perform t-test
svyttest(age~gender, df.w)