Example input
TEXT 1 [What’s special about Galactica?
Galactica is a language model, a type of AI trained to respond to natural language by repeatedly playing
a fill-the-blank word-guessing game.Most modern language models learn from text scraped from the
internet. Galactica also used text from scientific papers uploaded to the (Meta-affiliated)
website PapersWithCode. The designers highlighted specialised scientific information
like citations, maths, code, chemical structures,
and the working-out steps for solving scientific problems]
TEXT 2 [The preprint paper associated with the project (which is yet to undergo peer review) makes
some impressive claims. Galactica apparently outperforms other models at
problems like reciting famous equations (“Q: What is Albert Einstein’s famous mass-energy
equivalence formula? A: E=mc²”), or predicting the products of chemical reactions
(“Q: When sulfuric acid reacts with sodium chloride, what does it produce? A: NaHSO₄ + HCl”).
However, once Galactica was opened up for public experimentation, a deluge of criticism followed.
Not only did Galactica reproduce many of the problems of bias and toxicity we have seen in other
language models, it also specialised in producing authoritative-sounding scientific nonsense.
Authoritative, but subtly wrong bullshit generator
Galactica’s press release promoted its ability to explain technical scientific papers
using general language. However, users quickly noticed that, while the explanations it
generates sound authoritative, they are often subtly incorrect, biased, or just plain wrong.]
TEXT 3 [A galaxy of deep (science) fakes Galactica could make it easier for bad actors to
mass-produce fake, fraudulent or plagiarised scientific papers. This is to say
nothing of exacerbating existing concerns about students using AI systems for plagiarism.
Fake scientific papers are nothing new. However, peer reviewers at academic journals and conferences
are already time-poor, and this could make it harder than ever to weed out fake science.
Underlying bias and toxicity Other critics reported that Galactica, like other language models trained
on data from the internet, has a tendency to spit out toxic hate speech while unreflectively censoring
politically inflected queries. This reflects the biases lurking in the model’s training data, and Meta’s
apparent failure to apply appropriate checks around the responsible AI research.
The risks associated with large language models are well understood. Indeed, an influential paper
highlighting these risks prompted Google to fire one of the paper’s authors in 2020, and eventually
disband its AI ethics team altogether. Machine-learning systems infamously exacerbate existing societal
biases, and Galactica is no exception. For instance, Galactica can recommend possible citations for
scientific concepts by mimicking existing citation patterns (“Q: Is there any research on the effect of
climate change on the great barrier reef? A: Try the paper ‘Global warming transforms coral reef
assemblages’ by Hughes, et al. in Nature 556 (2018)”). For better or worse, citations are the currency
of science – and by reproducing existing citation trends in its recommendations, Galactica risks
reinforcing existing patterns of inequality and disadvantage. (Galactica’s developers acknowledge
this risk in their paper.) Citation bias is already a well-known issue in academic fields ranging
from feminist scholarship to physics. However, tools like Galactica could make the problem
worse unless they are used with careful guardrails in place.]
TEXT 4 [Here we go again Calls for AI research organisations to take the ethical dimensions of their
work more seriously are now coming from key research bodies such as the National Academies of
Science, Engineering and Medicine. Some AI research organisations, like OpenAI, are being more
conscientious (though still imperfect). Meta dissolved its Responsible Innovation team earlier this year.
The team was tasked with addressing “potential harms to society” caused by the company’s products.
They might have helped the company avoid this clumsy misstep.
Artificial intelligence (AI)
Machine learning
Text generators
Meta
Large language models
AI ethic
How The Conversation is different
Every article you read here is written by university scholars and researchers
with deep expertise in their subjects, sharing their knowledge in their own words.
We don’t oversimplify complicated issues, but we do explain and clarify. We believe bringing the
voices of experts into the public discourse is good for democracy.]
Example output
TextFusion:
Thank you for all of your input! Based on the text you have provided, Galactica is a language model trained
to respond to natural language, using text scraped from the internet and scientific papers uploaded to the
website PapersWithCode. The preprint paper associated with the project makes some impressive claims
about Galactica’s ability to outperform other models in problems like predicting chemical reactions or
reciting famous equations.
However, once Galactica was opened up for public experimentation, criticism followed, pointing out issues
of bias, toxicity, and authoritative-sounding scientific nonsense. In addition, Galactica could make it
easier for bad actors to mass-produce fake, fraudulent or plagiarised scientific papers, exacerbating
concerns around students using AI systems for plagiarism.
Calls for AI research organisations to take the ethical dimensions of their work more seriously are now
coming from key research bodies such as the National Academies of Science, Engineering and Medicine.
Some AI research organisations, like OpenAI, are being more conscientious, but still imperfect. Meta, the
company behind Galactica, dissolved its Responsible Innovation team earlier this year, which might have
helped the company avoid these issues.
Thank you for using TextFusion! Is there anything you would like me to revise or any other ways I can assist you?