Skip Navigation

I asked ChatGPT to recommend me some scifi books... Most recommendations where invented and don't exist

We all know by now that ChatGPT is full of incorrect data but I trusted it will no go wrong after I asked for a list of sci-fi books recommendations (short stories anthologies in Spanish mostly) including book names, editorial, print year and of course ISBN.

Some of the books do exist but the majority are nowhere to be found. I pick the one that caught my interest the most and contacted the editorial directly after I did not find it in their website or anywhere else.

This is what they replied (Google Translate):


ChatGPT got it wrong.

We don't have any books with that title.

In the ISBN that has given you the last digit is incorrect. And the correct one (9788477028383) corresponds to "The Holy Fountain" by Henry James.

Nor have we published any science fiction anthologies in the last 25 years.


I quick search in the "old site" shows that others have experienced the same with ChatGPT and ISBN searches... For some reason I thought it will no go wrong in this case, but it did.

23 comments
  • I’m possibly just vomiting something you already know here, but an important distinction is that the problem isn’t that ChatGPT is full of “incorrect data”, it’s that it is has no concept of correct or incorrect, and it doesn’t store any data in the sense we think of it.

    It is a (large) language model (LLM) which does one thing, albeit incredibly well: output a token (a word or part of a word) based on the statistical probability of that token following the previous tokens, based on a statistical model generated from all the data used to train it.

    It doesn’t know what a book is, nor does it have any memory of any titles of any books. It only has connections between token, scored by their statistical probability to follow each other.

    It’s like a really advanced version of predictive texting, or the predictive algorithm that Google uses when you start typing a search.

    If you ask it a question, it only starts to string together tokens which form an answer because the network has been trained on vast quantities of text which have a question-answer format. It doesn’t know it’s answering you, or even what a question is; it just outputs the most statistically probable token, appends it to your input, and then runs that loop.

    Sometimes it outputs something accurate - perhaps because it encountered a particular book title enough times in the training data, that it is statistically probable that it will output it again; or perhaps because the title itself is statistically probable (e.g. the title “Voyage to the Stars Beyond” will be much more statistically likely than “Significantly Nine Crescent Unduly”, even if neither title actually existed in the training data.

    Lots of the newer AI services put different LLMs together, along with other tools to control output and format input in a way which makes the response more predictable, or even which run a network request to look up additional data (more tokens) but the most significant part of the underlying tech is still fundamentally unable to conceptualise the notion of accuracy, let alone ensure they uphold it.

    Maybe there will be another breakthrough in another area of AI research of which LLMs will form an important part, but the hype train has been running hard to categorise LLMs as AI, which is disingenuous. Theyre incredibly impressive non-intelligent automatic text generators.

  • You asked for fiction so it gave you some on a whole new level.

    On a more serious note, other services like bing AI chat are more suited to this. It will behave more like an assistant for this kind of query and be able to search the web for lists of highly rated scifi titles, it can also give you titles similar to something else you enjoyed.

    ChatGPT is the same tech behind that but it's more closed off and unable to do those things properly. If it does spit out some good titles it'll be both a coincidence and using outdated data from whenever it was last trained.

23 comments