For several years, I worked for the World Bank analysing the early grade education outcomes in a number of different Pacific countries including Laos, Tonga and Papua New Guinea, amongst others. Recently, our earlier work in Papua New Guinea was published for the first time.
One of the more challenging things I did was model a difficult set of survey outcomes: reading amongst young children. You can see the reports here. Two of the most interesting relationships we observed were the importance of language for young children learning to read (Papua New Guinea has over 850 of them so this matters) and the role that both household and school environments play in literacy development.
At some point I will write a post about the choice between standard ordinary least squares regressions used in the field and the tobit models I (generally) prefer for this data. Understanding the theoretical difference between censored, truncated and continuous data isn’t the most difficult thing in the world, but understanding the practical difference between them can have a big impact on modelling outcomes.