svyby(~holts, by = ~habitat, design = mydesign, FUN = svymean) habitat holts seĪgricultural agricultural 1.750000 0.4634253 The estimates and standard errors of the domain means are the same as those obtained earlier, but the estimates and standard errors for the domain totals are different. Holts 4.1549 0.3256 svytotal(~holts, design = mydesign) total SEĭomain/stratum means and totals can also be estimated from the post-stratified design. svymean(~holts, design = mydesign) mean SE Now functions such as svymean and svytotal can be applied to the post-stratified design. aggregate(weights(mydesign) ~ habitat, data = otters, sum) habitat weights(mydesign) The function postStatify re-weights the observations to match the given frequencies as can be seen below. The population argument is specified as a data frame pop that must contain (a) the exact names of the strata and (b) the total number of sampling units ( Freq) in each stratum (i.e., \(N_1, N_2, \dots, N_4\)). Mydesign <- postStratify(design = mydesign, strata = ~habitat, population = pop) pop <- ame(habitat = c("cliffs", "agricultural", "peat", "non-peat"), The otter data can be post-stratified as follows. But since it is applied, by definition, to data not collected using a stratified random sampling design, such as simple random sampling, it is natural to discuss it here. Post-stratification is usually discussed in textbooks in the context of stratified random sampling. This is the estimator that is reported by svyby with the FUN = svytotal argument, but it can also be computed as follows. \] so that the estimator for \(\tau_d\) when \(N_d\) is unknown is equivalent to the estimate of the population total of \(y_i^* = d_i y_i\). Recall that the estimator of \(\mu\) is simply the sample mean \(\bar^n d_i y_i, An estimate for the population mean \(\mu\) and its associated standard error can be computed using the svymean function. Inferences can be made by applying special functions to the object created by svydesign. The object mydesign now includes the data as well as information about the design. This is a convention of the survey package but is not typical of other packages in R, so it is easy to forget to use it. Notice that the variables are always proceeded by a tilde ( ~). In cases where the finite population correction can be safely ignored such as when \(1-n/N \approx 1\), or when sampling is with replacement, this argument can be omitted. The argument fpc = ~N indicates the variable that size of the population from which the unit was sampled. The argument data = otters indicates the data frame. But for simple random sampling simply specify ids = ~1 to indicate one element per sampling unit. The argument ids = ~1 has to do with if or how elements are grouped within sampling units, such as in cluster sampling. mydesign <- svydesign(id = ~1, data = otters, fpc = ~N) A simple random sampling design can be specified as follows. The svydesign function in the survey package is used to create a survey design object that includes information about the design and the data. First load the survey package, installing it first if necessary using the command install.packages("survey"). This can be done by creating a variable N. otters$habitat <- factor(otters$habitat, labels = c("cliffs", "agricultural",īefore specifying the design it is necessary to include within the data frame the population size for the finite population correction. This can be done by changing habitat into a factor using the factor function and assigning labels to the strata. Although not necessary for the analysis, it would be helpful to have the stratum names be more descriptive. Here are the first six observations: head(otters) section habitat holtsĮach observation includes the section number, the type of habitat, and the number of holts. If you have not done so already you will need to install this package using install.packages("SDaA"). The data are available in the SDaA package. The sample featured here was collected using stratified random sampling, but for this example it will be treated as if it had been collected using simple random sampling. 2 The coastline was divided into 237 5km habitable sections, with each section classified as one of four types: cliffs over 10m (89 sections), agricultural (61 sections), peat (40 sections), and non-peat (47 sections). Kruuk et al. (1990) used a stratified random sampling design to estimate the number of otter ( Lutra lutra) dens or holts along a 1400km coastline of the Shetland Islands.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |