2w ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 28th December 2025

Want to wade into the snowy surf of the abyss? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.

Any awful.systems sub may be subsneered in this subthread, techtakes or no.

If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.

The post Xitter web has spawned soo many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)
Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.

(Credit and/or blame to David Gerard for starting this. Merry Christmas, happy Hannukah, and happy holidays in general!)

You're viewing a single thread.

163 comments

AI researchers are rapidly embracing AI reviews, with the new Stanford Agentic Reviewer. Surely nothing could possibly go wrong!
Here's the "tech overview" for their website.
Our agentic reviewer provides rapid feedback to researchers on their work to help them to rapidly iterate and improve their research.
The inspiration for this project was a conversation that one of us had with a student (not from Stanford) that had their research paper rejected 6 times over 3 years. They got a round of feedback roughly every 6 months from the peer review process, and this commentary formed the basis for their next round of revisions. The 6 month iteration cycle was painfully slow, and the noisy reviews — which were more focused on judging a paper's worth than providing constructive feedback — gave only a weak signal for where to go next.
How is it, when people try to argue about the magical benefits of AI on a task, it always comes down to arguing "well actually, humans suck at the task too! Look, humans make mistakes!" That seems to be the only way they can justify the fact that AI sucks. At least it spews garbage fast!
(Also, this is a little mean, but if someone's paper got rejected 6 times in a row, perhaps it's time to throw in the towel, accept that the project was never that good in the first place, and try better ideas. Not every idea works out, especially in research.)
When modified to output a 1-10 score by training to mimic ICLR 2025 reviews (which are public), we found that the Spearman correlation (higher is better) between one human reviewer and another is 0.41, whereas the correlation between AI and one human reviewer is 0.42. This suggests the agentic reviewer is approaching human-level performance.
Actually, now all my concerns are now completely gone. They found that one number is bigger than another number, so I take back all of my counterarguments. I now have full faith that this is going to work out.
Reviews are AI generated, and may contain errors.
We had built this for researchers seeking feedback on their work. If you are a reviewer for a conference, we discourage using this in any way that violates the policies of that conference.
Of course, we need the mandatory disclaimers that will definitely be enforced. No reviewer will ever be a lazy bum and use this AI for their actual conference reviews.
- we found that the Spearman correlation (higher is better) between one human reviewer and another is 0.41
  This stinks to high heaven, why would you want these to be more highly correlated? There's a reason you assign multiple reviewers, preferably with slightly different backgrounds, to a single paper. Reviews are obviously subjective! There's going to be some consensus (especially with very bad papers; really bad papers are always almost universally lowly reviewed, because you know, they suck), but whether a particular reviewer likes what you did and how you presented it is a bit of a lottery.
  Also the worth of a review is much more than a 1-10 score, it should contain detailed justification for the reviewers decision so that a meta-reviewer can then look and pinpoint relevant feedback, or even decide that a low-scoring paper is worthwhile and can be published after small changes. All of this is an abstraction, of course a slightly flawed one, but of humans talking to each other. Show your paper to 3 people you'll get 4 different impressions. This is not a bug!
- the noisy reviews — which were more focused on judging a paper’s worth than providing constructive feedback
  dafuq?
  
  Yeah, it's not like reviewers can just write "This paper is utter trash. Score: 2" unless ML is somehow an even worse field than I previously thought.
  They referenced someone who had a paper get rejected from conferences six times, which to me is an indication that their idea just isn't that good. I don't mean this as a personal attack; everyone has bad ideas. It's just that at some point, you just have to cut your losses with a bad idea and instead use your time to develop better ideas.
  So I am suspicious that when they say "constructive feedback", they don't mean "how do I make this idea good" but instead "what are the magic words that will get my paper accepted into a conference". ML has become a cutthroat publish-or-perish field, after all. It certainly won't help that LLMs are effectively trained to glaze the user at all times.
- Problem: Reviewers do not provide constructive criticism or at least reasons for paper to be rejected. Solution: Fake it with a clanker.
  Genius.

163 comments