This action will delete this post on this instance and on all federated instances, and it cannot be undone. Are you certain you want to delete this post?
This action will delete this post on this instance and on all federated instances, and it cannot be undone. Are you certain you want to delete this post?
This action will block this actor and hide all of their past and future posts. Are you certain you want to block this actor?
This action will block this object. Are you certain you want to block this object?
Just recently read the paper "Delving into ChatGPT usage in academic writing
through excess vocabulary". by Kobak et al. Their premise is that (from the abstract) the [models] can produce inaccurate information, reinforce existing biases, and can easily be misused. So, the authors analyse pubmed abstracts for vocabulary changes, and identify certain words that have become more common post LLM. They find that words such as "delves", "showcasing", "underscores", "intricate", "excel", "pivotal", "encompassing", "enhancing" are all showing an increased usage, and hence suspect.
While this data is indeed interesting, I wonder why LLMs tend to use these words. Aren't LLM outputs supposed to be more of a reflection of the data they are fed in training? Surely that means that these words are more common in some data set than we expect?