Using Survey Weight in R

Fahim Ahmad | 2022-05-14

In R working with survey weight is made possible using survey package. Let’s use below data frame as an exmaple here:

set.seed(1000)
age <- c(18:100)
age <- sample(age, 100, replace = TRUE)
gender <- c("Male", "Female")
gender <- sample(gender, 100, replace = TRUE)
country <- c("A", "B")
country <- sample(country, 100, replace = TRUE)

df <- data.frame(age, gender, country)

df$weight[df$gender=="Female"] <-50/sum(df$gender=="Female")
df$weight[df$gender=="Male"] <-50/sum(df$gender=="Male")

# summary of data
summary(df)
##       age           gender            country              weight      
##  Min.   :18.00   Length:100         Length:100         Min.   :0.8929  
##  1st Qu.:38.75   Class :character   Class :character   1st Qu.:0.8929  
##  Median :54.50   Mode  :character   Mode  :character   Median :0.8929  
##  Mean   :55.79                                         Mean   :1.0000  
##  3rd Qu.:73.25                                         3rd Qu.:1.1364  
##  Max.   :97.00                                         Max.   :1.1364

The most important variable here is the weight variable which is constructed to balance the sex ratio.

Inside the survey package, there is svydesign() function that can be used to link a data frame with a weight.

# install.packages("survey")
library(survey)
df.w <- svydesign(ids = ~1, data = df, weights = ~weight)

The resulting object is not a data frame anymore, but is a list of different objects that can be seen using attributes() function.

attributes(df.w)
## $names
## [1] "cluster"    "strata"     "has.strata" "prob"       "allprob"    "call"       "variables"  "fpc"       
## [9] "pps"       
## 
## $class
## [1] "survey.design2" "survey.design"

Therefore, we need to use survey’s own analytical functions. For example, here is a comparison of unweighted and weighted sex ratio.

# unweighted
df %>%
{table(.$gender)} %>%
  prop.table()
## 
## Female   Male 
##   0.44   0.56
# weighted
df.w %>%
  svytable(~gender, .) %>%
  prop.table()
## gender
## Female   Male 
##    0.5    0.5

svytable() can be used to create more than one-way frequency/percentage tables as well. For example, let’s create contingency table of gender and country

df.w %>%
svytable(~gender+country, .) %>%
  prop.table(2)
##         country
## gender           A         B
##   Female 0.5600000 0.4329897
##   Male   0.4400000 0.5670103

Below are other useful functions of survey package:

# to compute weighted mean
svymean(~age, df.w)
# to compute weighted quantiles
svyquantile(~age, df.w, c(.25, .50, .75))
# to compute weigted variance
svyvar(~age, df.w)
# to perform t-test
svyttest(age~gender, df.w)