Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text

Every day we learn new things about the ‘transformers’ that power AI chatbots. For example, they can take very scrambled text, and descramble it. But transformers shouldn’t work that way. They rely on turning text into ‘tokens’, and if you scramble the text, it should scramble the tokens. But it doesn’t:

…We found that only GPT-4 nearly flawlessly processes inputs with unnatural errors, even under the extreme condition, a task that poses significant challenges for other LLMs and often even for humans. Specifically, GPT-4 can almost perfectly reconstruct the original sentences from scrambled ones, decreasing the edit distance by 95%, even when all letters within each word are entirely scrambled. It is counter-intuitive that LLMs can exhibit such resilience despite severe disruption to input tokenization caused by scrambled text.

Read the full paper here.

Leave a comment