Is it possible to reliably identify correctional institution inmates in 1920, 1930, 1940 100% samples

I have tried identifying correctional institution inmates in 1920, 1930, 1940 100% samples by combining information on the variables “gqtype” (correctional institution=2) and “relation” (institutional inmates=13). Aggregating the individual-level information to the state-level results however in most cases in estimates very different from (typically far below) the numbers in published census reports. Are there any known coding errors or inconsistencies in these variables in these samples, for example, correctional institutions coded as non-correctional institutions on the “gqtype” variable or institutional inmates not coded as such on the “relate” variable, that could account for the discrepancies? I have tried combining information on the variables “serial” and “pernum” with “gqtype” but it does not solve the problem. Is there a way to combine information on other variables to accurately identify correctional institution inmates?

I think including information from the RELATE variable may be distorting your calculations. This is because people living in group quarters (e.g. correctional institutions) are typically samples as individuals. Therefore, they may be coded as the “householder” (e.g. RELATE==1). I’d suggest simply using the GQTYPE variable to identify the correctional institution population in the 1920, 1930, and 1940 samples. Additionally, the detailed codes of GQTYPE (e.g. GQTYPED) include more specific identification of the type of correctional institution: federal/state correctional, prison, penitentiary, military prison, local correctional, jail, etc.

Thank you for the suggestion. However, I am afraid that it does not solve the problem, because while correctional inmates are sampled as individuals, instututional employees and their families are sampled as households. To take but one example from 100% 1940 sample, one finds the following information:

gqtyped age serial pernum relate

Jail 42 34738 1 Head

Jail 40 34738 2 Spouse

Jail 16 34738 3 Child

Jail 14 34738 4 Child

Jail 12 34738 5 Child

So simply using the GQTYPE (or (GQTYPED) variable to identify the correctional institution population leads to inclusion of individuals that are clearly not institutional inmates.