Pop91 Home Page |
Undertanding Goodness-of-Fit |
(1) Summary assessment at ED-level (3) Summary assessment at ward level (4) Tabular & celluar fit at ward level (5) Recreation of constraints from extracted microdata See Huang and Williamson (2001) for a more detailed treatment of measures of fit 1)
Summary assessment of fit at ED-level
A
wide variety of reports on the fit of Pop91-generated synthetic microdata to
known constraints are available. Summary
measures for each synthetic ED population are placed in district-specific
report files with the name districtcode_Ged.dat (where districtcode
= the first four characters of the 8-characters ONS ED name).
These summary reports may be downloaded in zipped format from the
project website (follow this link). An
example file for one District is here. The
summary measures contained in districtcode_Ged.dat for each ED are: EDCode
-
ONS 8-character name Time
- CPU seconds taken to estimate data Evals
- no. of potential household replacements evaluated NoOfRep
- no. of actual household replacements made NFT
- no. of statistically non-fitting tables NFC
- no. of statistically non-fitting cells PFC
- no. of statistically non-fitting cells after allowance for the ±1
impact of pseudo-random pre-release SAS
data modification (barnardisation) OTAE
- Overall Total Absolute Error (sum of absolute error across all cells) ORSSZ
- Overall Relative Sum of Squared Z-scores TAE_X
- Total Absolute Error associated with table X (tables numbered in
order listed in
Pop91\Pop91CO_t17\Nm.dat) RSSZ_X
- Relative Sum of Square Z-scores associated with table X Temp
- final value of ‘temperature’; a simulated annealing control
parameter AreaP
- % of households in final combination drawn from the SAR region within
which the ED is located NoOfh
-
no. of households in synthetic ED As
a rule-of-thumb, synthetic microdata for EDs with NFT counts > 0, PFC>0
or OTAE > 250 should be treated with some caution. 2) Cellular
fit at ED-level
For each estimated ED a results file (EDCODE.est) is available detailing both the target constraining counts and their synthetic counterparts (the target counts are Crown Copyright, and are made available under licence). In raw form (example) these output files are given as table vectors. The program Reformat_estimates.exe (download here) reconfigures this information into standard SAS table format (including header and stub labels), to better enable visualisation of the cellular fit achieved and to allow for easier comparison to published 1991 Census SAS tables (example). Each reformatted output file includes the following information for each of the 14 tables used as constraints on the synthetic population estimation process: a) constraining SAS counts NOTE: In some cases these counts are not the ‘raw’ ONS counts, but modelled/revised counts generated to overcome problems of either data inconsistency between tables, or problems of 10% sampling. Click here for summary. b) estimated synthetic counts [Derived by aggregating synthetic microdata for ED of interest] c) Error (synthetic – target counts) d) Z-scores A Z-score of >=1.96 is taken to denote a non-fitting cell It is possible that systematic biases in synthetic microdata undetectable at ED-level might cause problems when data are aggregated to ward-level. Consequently a set of summary ward-level assessments of fit have been produced, comparing synthetic microdata, aggregated to ward-level, with known ED-level SAS-constraints, also aggregated to ward-level. Files named DISTRICTCODE_Gwd.test (example) provide the following measures of overall fit based on a summary of cellular and tabular fit across all tables: WDCODE - ONS four-character ward identifier ATAE - average and standard deviation of Total Absolute Error across all tables NOTFT, SD - average number of statistically non-fitting tables NOTFC1, SD - average number of statistically poorly fitting cells NOTFC2, SD - average number of statistically non-fitting cells ATAE/h - average of (Total Absolute Error / no. of households in ward) Noofh - no. of households in ward 4)
Tabular & cellular fit at ward-level The summary assessment of fit for data aggregated to ward level is supplemented by more detailed assessments at both tabular and cellular level, as outlined below. a) Cellular fit
Files named WDCODE_Cwd.test (example) contain the following measures of cellular fit at the aggregated ward-level, reported for each cell count in each constraining table: ·
synthetic
count (ward-level total) ·
maximum
and minimum estimated synthetic counts over k runs [the
above will be identical to the synthetic count as Pop91 uses only 1 run per
ED] ·
5th
and 95th percentile synthetic counts over k runs [the
above will be identical to the synthetic count as Pop91 uses only 1 run per
ED] ·
Z-score ·
mean
Z-score over k runs ·
poorly
fitting cells (Z>±1.96) ·
non-fitting
cells (Z>±1.96 even if ±1 added to SAS count to allow for effects of
barnardisation) b) Ta
|
|