Using Natural Language Processing to Quantify Politicization in Foreign Aid Reports
Léa Gontard, Jelke Bloem
Institute for Logic, Language and Computation, University of Amsterdam
Politicization within foreign aid reports is of great interest for political science experts to measure its correlation with major topics of concern, such as aid effectiveness. Reports that evaluate foreign aid projects are frequently made publicly available by the governments that commission them. However, textual reports of foreign aid are usually unstructured and unstandardized.
The goal of this study is to operationalize a text-based metric of politicization for documents by investigating the use of contextual word embeddings. We work with a sample of the United States Agency for International Development (USAID) public aid evaluations projects from the Development Experience Clearinghouse (DEC) written by external third parties on health-related project conducted by the USAID.
We attempt to capture ideological differences around certain keywords between reports conducted under Republican and Democrat government through contextual word embeddings methods. We compare state-of-the-art contextual embedding Bidirectional Encoder Representations from Transformers (BERT) and its enhanced version, Robustly Optimized BERT Pretraining Approach (RoBERTa). Using word embeddings allows to derive politicization scores for keywords from the cosine similarity of their averaged vector representation for each party. Politicization scores of documents are generated by averaging the politicization scores of the keywords present in a text.
During our presentation, we will present the results from assessing the correlation between the generated politicization scores and experts’ labelled data at the keyword level. We will also present the correlation at the document level using a ‘silver standard’ score generated from the experts’ labelled data.