keyboard_arrow_up
Enhancing Chinese-English Translation in AI Chatbots: A Comparative Evaluation of CHATGPT-4O and Grok-Beta using a Health Science Text from the New York Times

Authors

Wang Wei and Zhou Weihong, Beijing International Studies University, China

Abstract

The present study examines the effectiveness of contextual prompting, utilizing a universal prompting template for translation tasks, and revision prompting in enhancing the quality of Chinese-to-English translations of scientific texts. ChatGPT-4o and Grok-beta were employed as the AI translation models. The research utilized a New York Times article on the health benefits of sweet potatoes, along with its official Chinese translation, as the source material. Translation quality was evaluated using BLEU metrics complemented by qualitative measures, including accuracy, faithfulness, fluency, genre consistency, and terminology consistency, which are critical for assessing translations in science and technology domains. Statistical analysis indicated only marginal improvements with the use of second-stage prompting, which involved commands for review and revision. These findings raise questions about the reliability of BLEU scores as a sole evaluation metric. The study highlights the potential of AI-assisted translation for specialized genres while identifying notable discrepancies in chatbot outputs. Based on the findings, the study underscores the need for refined methodologies in evaluating translation quality and advocates for integrating more robust qualitative metrics in future research to enhance the reliability and applicability of AI-assisted translation in specialized contexts.

Keywords

AI-assisted translation; contextual prompting; BLEU metric; qualitative evaluation; Chinese-English translation; health science texts