in relation to the variable "city", why do so many individuals fall in the category "non in identifiable city"?

from 1960 to 2012, for 84% of the individuals it is not possible to determine the city of residence. Is there another way to get info on the city of residence?

The CITY variable does not identify every “city” in the United States, but rather a subset of some of the largest cities. Whether or not a city is included largely depends on population size. In 1980 most cities with populations of 100,000+ were included. In 1990, when Public Use Microdata Areas (PUMAs) were introduced as the smallest identifiable area in the PUMS, and every sample after 1990, a city was included if its boundaries matched up with the boundaries of its component PUMAs. This means that if a city contained five PUMAs, but one of those PUMAs only identified part of the city, the city is not identifiable and all of the residents would be coded as “Not in identifiable city.” The Comparability Statement for CITY includes more information on specific years.

Because cities can be fairly small, listing every individual’s city of residence could make some individuals identifiable. If you are interested in specific cities that are not identifiable, you can use the PUMA maps to approximate that city. If you are more interested in how many people live in or around cities, I would recommend looking into the METRO and METAREA variables.

I hope this helps.

thank you very much for your help, I really appreciate it.

If I can ask one very last question, is there a way to get the data relative to the variable “CITY” for the years 1970 and 1960? From the following link https://usa.ipums.org/usa-action/variables/group/h-geog it is reported that the variable is “available in other samples in this year” but when I check the availability (https://usa.ipums.org/usa-action/variables/CITY#availability_section) I cannot find anything in relation to those years.

Thank you anyway for your time.

Unfortunately those little blue "i"s are incorrect in this case. This is a known issue and it is being addressed. In the mean time, the Availabilty statement should always be considered the most accurate representation of the availabilty of a variable.