Evaluating the Intrinsic Effects of Prompt Engineering: The Indirect Extraction of Semantic Similarities from GPTs

Xander Snelder

University of Amsterdam

Previous studies have evaluated the intrinsic effects of prompt engineering on the performance of Generative Pre-trained Transformers (GPTs) across various Natural Language Processing (NLP) tasks. However, there is a lack of research examining how prompt engineering can improve the extraction of semantic similarities from GPTs. Given the closed-source nature of the these models, this study proposes a method to indirectly extract and calculate the semantic similarity scores between word pairs. A statistical analysis is conducted to evaluate and optimize prompt engineering techniques using the Dutch and English SimLex-999 benchmarks. Examples of evaluated prompts include zero-shot learning, variations of few-shot learning, and alternative semantic similarity scales. The results indicate that prompting each word pair individually yields the highest correlation with the SimLex-999 benchmarks, outperforming other prompts and improving upon previous research using Contextual Embeddings Models (CEMs).