CSES Module 1 Data Set Errata
Posted: October 1, 2002
STATA Data Types/Storage Formats: Data Descriptor Statements
Some variables in the CSES Module 1 dataset, in particular the weights and some of the identification variables, require larger data types/storage formats than were provided in the original STATA data descriptor statements. The result is that some variables are inappropriately rounded when reading the data into STATA.
This can have a number of adverse affects. One example is that some of the longer identification variables, particularly A1003 and A1009, may no longer be unique (due to rounding) and so merges based on those variables will not perform appropriately. Additionally, the weight variables for some countries may in fact be more accurate than the designated storage format allows.
Solution: STATA users will want to use a text editor to revise STATA data descriptor file ‘cm1_col.dct’ so that the proper data types/storage formats are used, as shown here (excerpted and revised from file ‘cm1_col.dct’):
long A1003 30- 37 double A1009 80- 89 double A1010_1 91- 101 double A1010_2 103- 112 double A1010_3 114- 124 double A1011_1 126- 135 double A1011_2 137- 146 double A1011_3 148- 157 double A1012_1 159- 169 double A1012_2 171- 181 double A1012_3 183- 193 double A1013 195- 204 double A1014_1 206- 215 double A1014_2 217- 227 double A1014_3 229- 239
For your convenience, we have a revised version of the file available for download here: cm1_col.dct.
NOTE 1:
After editing ‘cm1_col.dct’ most users will want to read in their ASCII data again using the revised STATA statements, so that the revised data types/storage formats for these variables are applied. Users who choose to to read and merge in only the affected variables will need to re-apply errata, merges, or other corrections that relied on identification variables A1003 and/or A1009.
NOTE 2:
This revision makes greater demands on your computer’s memory, and so depending on your computing resources and the version of STATA you are using, you may also want to use a text editor to alter the ‘set memory’ command in the file ‘cm1_run.do’. The relevant line in ‘cm1_run.do’ appears as ‘set memory 60m’. We recommend setting the value to ’65m’ or higher, which has worked well for us.
NOTE 3:
This is the first CSES release for which STATA statements have been provided. If you encounter any errors in the STATA data descriptor statements, or in using CSES files in STATA in general, please provide a detailed description of the problem by e-mail to [email protected] so that we may investigate the problem. Thank you!