OpenAI’s new “CriticGPT” model is trained to criticize GPT-4 outputs
OpenAI created CriticGPT to act as an AI assistant to human trainers who review programming code generated by the ChatGPT AI assistant.
OpenAI created CriticGPT to act as an AI assistant to human trainers who review programming code generated by the ChatGPT AI assistant.
Here we go again. Chatbot fails are now a familiar meme. The tendency to make things up is holding chatbots back. But that’s just what they do.
People find it difficult to distinguish between the GPT-4 model and a human agent when interacting with them as part of a 2-person conversation.
“We argue that these falsehoods, and the overall activity of large language models, is better understood as bulls**t in the sense explored by Frankfurt: the models are in an important way indifferent to the truth of their outputs.”
Even at this early stage, though, Anthropic’s research provides an exciting framework for making an LLM’s “black box” results that much more interpretable and, potentially, controllable.
GPT-4 showcased a markedly superior performance when compared to unspecialised junior doctors, who possess a proficiency level comparable to general practitioners with specialist eye knowledge.
The visual recognition skills of the large language model fell far short of clinical standards, achieving a positive predictive value (PPV) of less than 25% during its best attempt at trying to spot image findings from a set of 100 chest x-rays.
Med-Gemini was tested on 14 medical benchmarks and established a new state-of-the-art (SoTA) performance on 10, surpassing the GPT-4 model family on every benchmark where a comparison could be made.
Researchers found that morality judgments given by ChatGPT4 were “perceived as superior in quality to humans'” along a variety of dimensions like virtuosity and intelligence.
ChatGPT-4 generated acceptable messages to patients without any additional editing by radiation oncologists 58% of the time, and 7% of responses generated by GPT-4 were deemed unsafe by the radiation oncologists if left unedited.
Generative AI is continuing to improve — so publishers, grant-funding agencies and scientists must consider what constitutes ethical use of LLMs, and what over-reliance on these tools says about a research landscape that encourages hyper-productivity.
By ensuring that these first few data points of a conversation remain in memory, the researchers’ method allows a chatbot to keep chatting no matter how long the conversation goes.
A team of researchers at New York University wondered if AI could learn like a baby. What could an AI model do when given a far smaller data set—the sights and sounds experienced by a single child learning to talk?
An artificial intelligence (AI) system trained to conduct medical interviews matched, or even surpassed, human doctors’ performance at conversing with simulated patients and listing possible diagnoses on the basis of the patients’ medical history.
We found that only GPT-4 nearly flawlessly processes inputs with unnatural errors, even under the extreme condition, a task that poses significant challenges for other LLMs and often even for humans…
LLMs have been shown to “hallucinate” factually incorrect information, using them to make verifiably correct discoveries is a challenge. But what if we could harness the creativity of LLMs by identifying and building upon only their very best ideas?
Artificial intelligence (AI) systems are often depicted as sentient agents poised to overshadow the human mind. But AI lacks the crucial human ability of innovation, researchers at the University of California, Berkeley have found.
This feature in Nature asks whether the poor use of AI is doing science more harm than good:
ChatGPT failed to accurately risk stratify 35% of patients studied, but the artificial intelligence (AI) chatbot was able to provide accurate treatment recommendations.
The current version of ChatGPT has limitations in accurately answering MCQs and generating correct and relevant rationales, particularly when it comes to referencing. To avoid possible threats, ChatGPT should be used with supervision.
“It will make it very easy for any researcher or group of researchers to create fake measurements on non-existent patients, fake answers to questionnaires or to generate a large data set on animal experiments.”
The authors describe the results as a “seemingly authentic database”.
The study revealed that large language models may, in fact, be able to understand and respond to emotional cues.Researchers found that LLMs produced higher-quality outputs when emotional language was used to talk to AI chatbots.
Data sets that are poorly thought out or insufficiently described increase the risk of ‘garbage in, garbage out’ studies and the propagation of biases, rendering outcomes meaningless or, even worse, dangerous.
This feature from COSMOS Magazine asks — can we even ask if AI is conscious? And what does ‘consciousness’ even mean?
n this latest study, DeepMind researchers found “Take a deep breath and work on this problem step by step” to be the most effective prompt when used with Google’s PaLM 2 language model. The phrase achieved the top accuracy score of 80.2 percent in tests against GSM8K, which is a data set of grade-school math word problems.