PRDH 1852 :: Reaction by prof. Michael Haan

1852 Oversampling Strategy

Reaction by Michael Haan

The 1852 Census of Upper and Lower Canada:
Proposed Oversampling Strategy, and Discussion

Reaction by Michael Haan, Department of Sociology, University of Toronto

Hi Lisa,

Following up on my promise to send you information on nearest neighbour/hot deck imputation methods, I attach 3 references, 3 from Statcan methodologists, and one from a couple of academics...

Fellegi, I.P., and D. Holt. 1976. "A Systematic Approach to Automatic Edit and Imputation", Journal of the American Statistical Association 71 (353): 17-35.

Podehl, W.M. 1974 'Introduction to the Generalized Editing and Imputation System using Hot-Deck Approach' Statistics Canada General Social Surveys Division

Little, Rod and Donald Rubin. 1987. Statistical Analysis with Missing Data. New York: Wiley & Sons - especially pages 65-66, which discuss the nearest neighbour hot deck.

Statistics Canada. 2002. Imputation of Demographic Variables From the 2001 Census of Population. Paper presented at the UNECE Work Session on Statistical Data Editing (Working Paper #25).

Note, however, that all of these resources refer to individual imputation, not regional imputation. To perform this, you will first have to create an aggregate dataset (at the level of the subdivision), sort it by the variables of interest (district, plus any other characteristics you pull off the published counts), then take the preceding or succeeding value (usually preceding) and merge the new, complete, data back to an individual file.

Many statisticians warn against doing this, as it results in underestimated variances in any regression run on the data.

If you include a flag, however, people could choose not to use the imputed data if they don't want to.

Last updated: 2/10/2021

Reaction by Michael Haan

The 1852 Census of Upper and Lower Canada: Proposed Oversampling Strategy, and Discussion

Reaction by Michael Haan, Department of Sociology, University of Toronto

The 1852 Census of Upper and Lower Canada:
Proposed Oversampling Strategy, and Discussion