Google's new ASPIRE method lets LLM rate itself

Researchers at Google and the University of Wisconsin-Madison have introduced a selective prediction system that lets LLMs score their own outputs, achieving better scores than models 10 times larger in size through soft cue fine-tuning and self-assessment learning, providing a very good direction for the development of the next generation of reliable LLMs.

Researchers at the University of Wisconsin-Madison and Google recently developed a system called ASPIRE that allows large models to give ratings to their output.

If the user sees that the model's generated results are rated poorly, he or she can realize that the response may be a hallucination.

If the system can further filter the results of the ratings for output, e.g., if the ratings are too low, the larger model may generate "I can't answer this question," thus hopefully maximizing the improvement of the hallucination problem.

ASPIRE enables LLM to output the answer and the confidence score of the answer.

The researchers' experimental results show that ASPIRE significantly outperforms traditional selective prediction methods on a variety of QA datasets (e.g., CoQA benchmarks).

Let the LLM not only answer the questions, but also evaluate those answers.

On benchmark tests of selective prediction, the researchers achieved more than 10 times the size of the model with the ASPIRE system.

It's like asking students to validate their own answers in the back of a textbook, which sounds a bit implausible, but when you think about it, everyone does get a grade on how satisfied they are with the answer after they've made a question.

This is the essence of ASPIRE, which involves three stages:

  • Task-specific tuning
  • Answer sampling
  • Self-assessment learning

In the researchers' view, ASPIRE is not just another framework; it represents a bright future for improving LLM reliability and reducing illusions across the board.

If LLM can become a trusted partner in the decision-making process.

Just by continuously optimizing the ability to make selective predictions, humans are one step closer to realizing the full potential of large models.

With ASPIRE, the researchers hope to kick-start the evolution of the next generation of LLMs, which will enable the creation of more reliable and self-aware artificial intelligence.

The researchers' experimental journey with ASPIRE emphasizes a key shift in the LLM landscape: the capacity of a language model is not the be-all and end-all of its performance.

Instead, the effectiveness of models can be dramatically improved through strategic adjustments that allow for more accurate and confident predictions even in smaller models.

Thus, ASPIRE demonstrates the potential of LLMs to intelligently determine the certainty of their answers and to significantly outperform other 10x volume models in selective prediction tasks.

Trend-Tech 2024-01-20 18:08

POSTS

2024-02-02
Light: Release Yourself, The Flickering Brushstrokes Just Glow In The Darkness of The Night

Woolf's words are like a beam of light flickering in the darkness of the night, illuminating the corners of our hearts and inspiring us to think about ourselves and the world.