In R working with survey weight is made possible using
survey
package. Let’s use below data frame as an
exmaple here:
set.seed(1000)
age <- c(18:100)
age <- sample(age, 100, replace = TRUE)
gender <- c("Male", "Female")
gender <- sample(gender, 100, replace = TRUE)
country <- c("A", "B")
country <- sample(country, 100, replace = TRUE)
df <- data.frame(age, gender, country)
df$weight[df$gender=="Female"] <-50/sum(df$gender=="Female")
df$weight[df$gender=="Male"] <-50/sum(df$gender=="Male")
# summary of data
summary(df)
## age gender country weight
## Min. :18.00 Length:100 Length:100 Min. :0.8929
## 1st Qu.:38.75 Class :character Class :character 1st Qu.:0.8929
## Median :54.50 Mode :character Mode :character Median :0.8929
## Mean :55.79 Mean :1.0000
## 3rd Qu.:73.25 3rd Qu.:1.1364
## Max. :97.00 Max. :1.1364
The most important variable here is the weight variable which is constructed to balance the sex ratio.
Inside the survey
package, there is
svydesign()
function that can be used to link a
data frame with a weight.
# install.packages("survey")
library(survey)
df.w <- svydesign(ids = ~1, data = df, weights = ~weight)
The resulting object is not a data frame anymore, but is a
list of different objects that can be seen using
attributes()
function.
## $names
## [1] "cluster" "strata" "has.strata" "prob" "allprob" "call" "variables" "fpc"
## [9] "pps"
##
## $class
## [1] "survey.design2" "survey.design"
Therefore, we need to use survey
’s own
analytical functions. For example, here is a comparison of
unweighted and weighted sex ratio.
##
## Female Male
## 0.44 0.56
## gender
## Female Male
## 0.5 0.5
svytable()
can be used to create more than
one-way frequency/percentage tables as well. For example, let’s
create contingency table of gender
and
country
## country
## gender A B
## Female 0.5600000 0.4329897
## Male 0.4400000 0.5670103
Below are other useful functions of survey
package: