Inference with data coming from multiple framesthe use of auxiliary information

  1. David Molina Muñoz
Supervised by:
  1. Antonio Arcos Cebrián Director
  2. María del Mar Rueda García Director

Defence university: Universidad de Granada

Year of defence: 2016

Committee:
  1. Ana María Aguilera del Pino Chair
  2. Patricia Román Román Secretary
  3. Enrique Francisco González Dávila Committee member
  4. Anne Ruiz Gazen Committee member
  5. María Dolores Ugarte Martínez Committee member

Type: Thesis

Abstract

Multiple frame surveys were first introduced by Hartley (1962) as a device for reducing data collection costs without affecting the accuracy of the results with respect to single frame surveys. In a multiple frame survey, Q >= 2 sampling frames are available for sampling. Although each of them may be incomplete, it is assumed that, overall, they cover the entire target population. Then, independent samples are selected, one from each frame, possibly under a different sampling design, and information is properly combined to get estimates. Since its emergence, multiple frame sampling theory has experienced a noticeable development, and a number of estimators for the total of a continuous variable have been proposed. First proposals were formulated in a dual frame context, i.e. in the case where two frames are available for sampling. Hartley (1962) himself proposed the first dual frame estimator, which was improved by Lund (1968) and Fuller and Burmeister (1972). Bankier (1986), Kalton and Anderson (1986) and Skinner (1991) proposed dual frame estimators based on new techniques. Skinner and Rao (1996) and Rao and Wu (2010) applied likelihood methods to compute estimators that perform well in complex designs. More recently, Ranalli et al. (2015) and Elkasabi et al. (2015) used calibration techniques to derive estimators in the dual frame context. In recent years, a number of works focusing on the estimation in cases with three or more sampling frames have arisen. Lohr and Rao (2006) extended some of the estimators proposed so far to the multiple frame set-up. Mecatti (2007) used a new approach based on the multiplicity of each unit (i.e. in the number of frames the unit is included in) to propose an estimator which is easy to compute. Multiplicity is also used by Rao and Wu (2010) to provide an extension of the pseudo empirical likelihood estimator to the case of more than two frames. In 2011, Singh and Mecatti suggested a class of multiplicity estimators that encompasses all the multiple frame estimators available in the literature by suitably specifying a set of parameters. However, little attention has been devoted to the study of qualitative variables in a multiple frame context. Qualitative variables are needed to properly represent the responses provided to multiple choice questions, which are quite frequent in surveys. An important contribution of this thesis is related to the formulation of estimators for the proportions of response variables with discrete outcomes. Estimators for proportions of both multinomial and ordinal response variables have been proposed. On the other hand, the benefits of the multiple frame approach have increased their popularity among the scientific community and now this methodology is widely used when conducting surveys. Remarkable is the use of dual frame surveys when carrying out telephone surveys. In some subject areas (e.g., electoral), face-to-face surveys have been completely ousted by telephone interviewing. Telephone surveys present some drawbacks with regard to coverage, due to the absence of a telephone in some households and the generalized use of mobile phones, which are sometimes replacing fixed (land) lines entirely. Dual frame telephone surveys that combine Random-Digit-Dialing (RDD) landline telephone samples and cell phone samples are a good solution to that issue since they reduce the noncoverage due to cell-only households in RDD landline telephone surveys. Therefore, in those situations, software for analyzing data coming from dual frame surveys would be very useful. No existing software covered dual frame estimation procedures until Frames2, another important contribution of this thesis, was released. Frames2 is an R package for point and interval estimation in a dual frame context which implements the main estimators for dual frame data proposed so far. This thesis is presented as a compendium of four publications in relation with the contents of the thesis. Three of the papers are already published in specialized journals and the fourth one is submitted for reviewing. The full version of the papers is included in Appendices A1 - A4. Preceding the appendix section, a list of chapters that summarize the most important aspects of the papers to facilitate their reading is presented. The first chapter introduces the problem of estimation in a multiple frame context and reviews the existing approaches for estimating parameters from data coming from a multiple frame survey. In Chapter 2, the objectives this thesis pursues are enumerated. The methodology used and the most relevant results obtained are presented in Chapters 3 and 4, respectively. Chapter 5 summarizes the conclusions derived from the results obtained. Finally, Chapter 6 (in Spanish) and Chapter 7 (in English) provide some notes on the current research related to the topics addressed in the thesis that is being carried out at present.