OpenAI has unveiled CriticGPT, a new AI model built on GPT-4, specifically designed to identify errors in code generated by ChatGPT. In trials, CriticGPT improved code review outcomes by 60% compared to those who did not use it.
CriticGPT will be integrated into OpenAI’s Reinforcement Learning from Human Feedback (RLHF) labeling pipeline to equip AI trainers with better tools for evaluating complex AI outputs.
The GPT-4 models powering ChatGPT are designed to be helpful and interactive through RLHF. This process involves AI trainers comparing different responses and rating their quality. As ChatGPT’s reasoning improves, its mistakes become subtler, making it harder for trainers to identify inaccuracies. This highlights a key limitation of RLHF: advanced models can become so knowledgeable that human trainers struggle to provide meaningful feedback.
CriticGPT has been trained to write critiques that highlight inaccuracies in ChatGPT’s answers. Although its suggestions are not always perfect, they significantly help trainers identify more issues than when working without AI assistance. In experiments, teams using CriticGPT produced more comprehensive critiques and identified fewer false positives compared to those working alone. A second trainer preferred the critiques from the Human+CriticGPT team over those from an unassisted reviewer more than 60% of the time.
CriticGPT was trained using a method similar to ChatGPT but focused on identifying mistakes. AI trainers inserted errors into ChatGPT’s code and provided example feedback. These trainers then compared multiple critiques of the modified code to evaluate CriticGPT’s performance. CriticGPT’s critiques were preferred in 63% of cases involving naturally occurring bugs, partly because it produced fewer unhelpful complaints and fewer hallucinated problems.
Despite its success, CriticGPT has limitations. It was trained on short ChatGPT answers and needs further development to handle longer, more complex tasks. Additionally, while models still hallucinate and trainers occasionally make labeling mistakes, the focus on single-point errors needs to expand to address errors spread across multiple parts of an answer.