Articles

Butterflies and ChatGPT

Prompting is the way we talk to generative AI and large language models (LLM’s). The way we construct a prompt can change a models decision on the results it provides and impact the accuracy as well. Research from the University of Southern California Information Sciences Institute shows that a minute tweak – such as a space at the beginning of a prompt can change the results.  This is likened to chaos theory where a butterfly flaps its wings generating a minor ripple in the air, resulting in a tornado several weeks later in a faraway land.

The researchers, who were sponsored by the US Defense Advanced Research Projects Agency (DARPA), chose ChatGPT and applied various different prompt variations.  Even slight changes led to significant changes in the results. They found many factors at play and there is more work to be done to ascertain solutions to this effect.

Why do slight changes result in such significant changes?  Do the changes “confuse” the model?  By running experiments across 11 classification tasks, they were able to measure how often the LLM changed its predictions and the impact on accuracy. By studying the correlation between confusion and the instances likelihood of having its answer changed (using a subset of the task with individual human annotations), they did not get a full answer.

So what?:

Generating LLMs which are resistant to changes and yield consistent, accurate answers is a logical next step.  However, this will require a deeper understanding of why responses change under minor tweaks.  Is there a way we can anticipate these resulting changes in outputs?  With ChatGPT being integrated into systems at scale this work will be important for the future.

Link to full article.