PRDH 1852 :: Flat Sample

1852 Oversampling Strategy

1852 Flat Sample

The 1852 Census of Upper and Lower Canada:
Proposed Oversampling Strategy, and Discussion

Flat Sample

The 1852 Census Project, which aims to create a 10% microdata sample of this census, has begun with a flat 10% sample of the existing manuscript pages. We have finished with our flat 10% sample of Upper Canada and have almost finished with Lower Canada. To date, we have entered 158,150 persons into the database. These represent 158,150 / 1,329,638 = 11.89% of persons available for data transcription. The initial data entry phase represents more than 10% of persons available for data transcription because we have entered remaining dwelling members who were recorded on a subsequent set of pages. At a later stage, we will identify, remove and archive the partial dwellings captured at the top of each sampled page.

If we consider the total population of Upper and Lower Canada in 1852, our 10% sample represents only about 8.6% of the total population (and even less once we remove the dwelling fragments from the top of each page). The sub-districts and divisions which are present in the manuscript record are not representative of the whole population, resulting in a biased sample. For instance, most large cities of Upper and Lower Canada, including most of Montréal and Toronto, and some of the smaller cities, such as Kingston, London, St. Catherines are missing. We cannot compensate for the lack of large city dwellers in our sample, except to add to the database the 100% sample of Québec City prepared by Marc St.-Hilaire and his colleagues at Université Laval (weighted accordingly), and perhaps to oversample the St. Louis Ward of Montréal, all of the city of Hamilton, and the Ottawa East and West wards (if funds and time permit).

We should, however, find a way to compensate for the absence of the other rural sub-districts and divisions, creating a microdata sample useful for the study of rural Upper and Lower Canada in 1852. We have enough remaining data entry funds to pay for additional rural oversamples. My key question to my colleagues is: would it be reasonable to attempt to compensate for the absence of missing rural sub-districts and divisions by oversampling neighbouring sub-districts and divisions which bear similar socio-demographic characteristics? I do not know of a similar project which has taken this approach, but the logic is similar to the hot-decking programs used with the IPUMS data and contemporary statistical data to impute missing data for particular variables.

Last updated: 2/10/2021

1852 Flat Sample

The 1852 Census of Upper and Lower Canada: Proposed Oversampling Strategy, and Discussion

Flat Sample

The 1852 Census of Upper and Lower Canada:
Proposed Oversampling Strategy, and Discussion