Clean water and education: Honest feedback on an informal analysis
I have created an informal analysis on the effect of clean water on education rates.
The analysis leveraged ETL functions (created by Claude), data wrangling, EDA, and fitting with sklearn and statsmodels. As the final goal of this analysis was inference, and not prediction, no hyperparameter tuning was necessary.
The clean water data was sourced from the WHO/UNICEF Joint Monitoring Programme for Water Supply, Sanitation, and Hygiene (JMP); while the education data was sourced from a popular Kaggle repository. The education data, despite being from a less credible source, was already cleaned and itemized; the clean water data required some wrangling due to the vast nature of the categories of data and the varying presence of null values across years 2000 - 2024. The final broad category of predictor variables selected was "clean water in schools, by country"; the outcome variable was "college education rates, by country."
I would be grateful for any feedback on my analysis, which can be found at https://analysis-waterandeducation.com/.
TIA.
[link] [comments]
Want to read more?
Check out the full article on the original site