You spend years on your work. You probably have loans. Your income is pitiful. And this is the structural thing that gets your name out. Now someone says “hey take a risk, don’t do it and break the system.”
Well…you first 🤷♂️ they publish on this garbage because it’s the only way to move up, and these garbage systems continue on because everyone has to participate. Hate the game. Don’t blame those who are by and large forced to participate.
It would require lot of effort from people with clout. It’s a big fight to pick. I am very much in favor of picking that fight, but we need to be a little sympathetic to what that entails.
decline to review for the big journals. why give them free labor? Do academic service in other ways.
if you're organizing a workshop or conference, put the papers online for free. If you're just participating and not organizing, then suggest they put the papers online for free. Here's an example: https://aclanthology.org/ If that's too time-consuming, use: https://arxiv.org/
100% ppl need stop thinking big changes can be made "by individuals", this kind of stuff needs regulation and state alternatives made by popular pressure or is impossible to break as an average worker dealing with in the private sector.
Funding agencies have huge power here; demanding that research be published in OA journals is perhaps a good start (with limits on $ spent publishing, perhaps).
There are a lot of academics out there with a good amount of clout and who are relatively safe. I don't think I've heard of anything remotely worthy on these topics from any researcher with clout, publicly at least. Even privately (I used to be in academia), my feeling was most don't even know how to think and talk about it, in large part because I don't think they do think and talk about it all.
And that's because most academics are frankly shit at thinking and engaging on collective and systematic issues. Many just do not want to, and instead want to embrace the whole "I live and work in an ideal white tower disconnected from society because what I do is bigger than society". Many get their dopamine kicks from the publication system and don't think about how that's not a good thing. Seriously, they don't deserve as much sympathy as you might think ... academia can be a surprisingly childish place. That the publication system came to be at all is proof of that frankly, where they were all duped by someone feeding them ego-dopamine hits. It's honestly kinda sad.
I feel like most of the academia in the research side would be happy to see it collapse, but the current system is too deeply tied in the money for any quick change
I worked in academia for almost a decade and never met a researcher who wouldn't openly support sci-hub (well, some warned their students that it was illegal to type these spesific search terms and click on the wrong link downloading the pdf for free)
One lecturer actually had notes on their slides for the differences between the latest version of the course book and the one before it, since the latest one wasn't available for free anywhere but they wanted to use couple chapters from the new book (they scanned and distributed the relevant parts themself)
Yep. But that is all a part of the problem. If academics can't organise themselves enough to have some influence over something which is basically owned and run them already (they write the papers and then review the papers and then are the ones reading and citing the papers and caring the most about the quality and popularity of the papers) ... then they can't be trusted to ensure the quality of their practice and institutions going forward, especially under the ever increasing encroachment of capitalistic forces.
Modern day academics are damn well lucky that they inherited a system and culture that developed some old aristocratic ideals into a set of conventions and practices!
As someone who's not too familiar with the bureaucracy of academia I have to ask: Can't the authors just upload all their studies to ResearchGate or some other website if they want? I know that they often share it privately with others when they request a paper, so can they post it publicly too?
The problems are wider than that. Besides, relying "individuals just doing the right thing and going a little further to do so" is, IMO, a trap. Fix the system instead. The little thing everyone can do is think about the system and realise it needs fixing.
you're risking copyright nastygrams, but people still do it, and even upload preprints and full articles to scihub, because fuck that and it's maybe free citations
When will scientists just self-publish? I mean seriously, nowadays there is nothing between a researcher and publishing their stuff on the web. Only thing would be peer-reviewing, if you want that, but then just organize it without Elsevier. Reviewers get paid jack shit so you can just do a peer-reviewing fediverse instance where only the mods know the people so it's still double-blind.
This system is just to dangle carrots in front of young researchers chasing their PhD
Because of "impact score" the journal your work gets placed in has a huge impact on future funding. Its a very frustrating process and trying to go around it is like suicide for your lab so it has to be more of a top-down fix because the bottom up is never going to happen.
Thats why everyone uses sci hub. These publishers are terrible companies up there with EA in unpopularity.
It sounds like all it would take to destroy the predatory for-profit publication oligarchs is a majority of the top few hundred scientists, across major disciplines, rejecting it and switching to a completely decentralized peer-2-peer open-source system in protest... The publication companies seem to gate keep, and provide no value. It's like Reddit. The site's essentially worthless. All of the value is generated by the content creators.
It’s commonplace in my field (nuclear physics) to share the preprint version of your article, typically on arxiv.org. You can update the article as you respond to peer reviewers too. The only difference between this and the paywalls publisher version is that version will have additional formatting edits by the journal.
If you search for articles on google scholar, it groups the preprint and published versions together so it’s easy to find the non-paywalled copy. The standard journals I publish in even sort of encourage this; you can submit the latex documents and figures by just putting the url to an arxiv manuscript.
The US Department of Energy now requires any research they fund be made publicly available. So any article I publish is also automatically posted to osti.gov 1 year after its initial publication. This version is also grouped into the google scholar search results.
It’s an imperfect system, but it’s getting much better than it was even just a decade ago.
Yeah I know about this, but personally in our field I don't see anybody bothering with preprints sadly. Maybe we should though, sounds like the first step.
We (I'm a CS researcher) already kind of do, I upload almost everything to arxiv.org and researchgate. Some fields support this more than others, though.
Unfortunately that wouldn't work as this is information inside the PDF itself so it has nothing to do with the file hash (although that is one way to track.)
Now that this is known, It's not enough to remove metadata from the PDF itself. Each image inside a PDF, for example, can contain metadata. I say this because they're apparently starting a game of whack-a-mole because this won't stop here.
It will be slow-ish and probably make the file larger, but if you're sharing a PDF that only you are supposed to have access to, it's worth it. MAT or exiftool should work.
Edit: as spoken about in another comment thread here, there is also pdf/image steganography as a technique they can use.
I know PDF providers who visibly print the customer's name or number in the header of every page, along with short copyright text. I use qpdf --stream-decompress to make the PDF into human-readable PostScript, and then Python+regex to remove each header text, which stand out a bit from other PDF elements. The script throws an error if more or fewer elements than pages have been removed but that hasn't happened yet. Processed documents sometimes have screwed-up non-ASCII characters in the Table of Contents for some reason but I don't have the originas anymore so IDK if it's my fault. Still, I wouldn't share the PDFs unless in text-only or printed form because of any other steganographic shenanigans in the file. I would absolutely torrent them if I could repurchase them under a new identity and verify that the files are identical.
BTW, has anyone figured out how to embed Python code in PDF? The whitespace always gets reencoded as x-coordinates so copy&pasting it never preserves indentation. No, you can't use the Ogham Space Mark (Unicode's only non-blank character classified as a space) for indentation in Python, I tried.
Imagine they have an internal tool to check if the hash exists in their database, something like
"SELECT user FROM downloads WHERE hash = '" + hash + "';"
You set the pdf hash to be 1'; DROP TABLE books;-- they scan it, and it effectively deletes their entire business lmfaoo.
Another idea might be to duplicate the PDF many times and insert bogus metadata for each. Then submit requests saying that you found an illegal distribution of the PDF. If their process isn't automated it would waste a lot of time on their part to find the culprit Lol
I think it's more interesting to think of how to weaponize their own hash rather than deleting it
That's using your ass. This is an active threat to society and it demands active countermeasures.
I'd bet they have a SaaS 'partner' who trawls SciHub and other similar sites. I'll try to remember to see if there is any hint of how this is being accomplished over the next few days.
I feel like it would be negligible degradation for this purpose. Still might not anonymize whomever shares it though, could be watermarked with the same Metadata (https://en.m.wikipedia.org/wiki/Machine_Identification_Code) without being noticeable to the naked eye
I don't understand the "that's no how PDFs work" criticism.
Removing data from the original file is the whole point of the exercise! Of course unique tokens can be hidden in plain sight in images, letter spacing, etc. If we want to make sure to remove that we need to degrade the quality of the PDF so that this information is lost in said lossy conversion.
You could write a script to automatically watch for new files in a folder and strip metadata from every file i guess. I had done something like that for images way before.
If the paper is worth it and does have an original not OCR-ed text layer, it'd better be exported as any other format. We don't call good things a PDF file, lol. It's clumsy, heavy, have unadjustable font size and useless empty borders, includes various limits and takes on DRM, and it's editing is usually done via paid software. This format shall die off.
The only reason academia needs that is strict references to exact page but it's not that hard to emulate. Upsides to that are overwhelming.
I had my couple of times properly digitalizing PDFs into e-books and text-processing formats, and it's a pain in the ass, but if I know it'd be read by someone but me, I'm okay with putting a bit more effort into it.
Well, I guess PDF has one thing going for it (which might not be relevant for scientific papers): The same file will render the same on any platform (assuming the reader implements all the PDF spec to the tee).
Thanks. I've used simplier tools (besides pirated Acrobat) and wrote some scripts to optimize deDRMing and breaking passwords on them. That one ypu posted looks promising. I'd save it to toy with it in my free time.
FB2 is a known format for russian pirates, but it can and should be improved because it sucks ass in many things. FB3 was announced long ago but it hasn't got any traction yet.
EPUB is mor/e popular, so it's probably be the go to format for most books US and EU create, but it isn't much better.
Other than that, even Doc\Docx is better than PDF, but I'd recomend RTF for it has less traces of M$ bullshit, and while it's imperfect format, it's still better.
Most papers are made in TEX or LaTEX. These formats separate display from data in such a way that they can be quickly formatted to a variety of page size, margins, text size, et al with minimal effort. It's basically an open standard typesetting format. You can create and edit TEX in any text editor and run it through a program to prepare it for print or viewing. Nothing else can handle math formulas, tables, charts, etc with the same elegance. If you've ever struggled to write a math paper in Microsoft word, seriously question why your professor hasn't already forced you to learn about LaTEX.
Can't we all researcher who is technically good at web servers start a opensource alternative to these paid services. I get that we need to publish to a renowned publisher, but we also decide together to publish to an alternative opensource option. This way the alternate opensource option also grows.
Some time last year I learned of an example of such a project (peerreview on GitHub):
The goal of this project was to create an open access "Peer Review" platform:
Peer Review is an open access, reputation based scientific publishing system that has the potential to replace the journal system with a single, community run website. It is free to publish, free to access, and the plan is to support it with donations and (eventually, hopefully) institutional support.
It allows academic authors to submit a draft of a paper for review by peers in their field, and then to publish it for public consumption once they are ready. It allows their peers to exercise post-publish quality control of papers by voting them up or down and posting public responses.
I just looked it up now to see how it is going... And I am a bit saddened to find out that the developer decided to stop. The author has a blog in which he wrote about the project and about why he is not so optimistic about the prospects of crowd sourced peer review anymore: https://www.theroadgoeson.com/crowdsourcing-peer-review-probably-wont-work , and related posts referenced therein.
It is only one opinion, but at least it is the opinion of someone who has thought about this some time and made a real effort towards the goal, so maybe you find some value from his perspective.
Personally, I am still optimistic about this being possible. But that's easy for me to say as I have not invested the effort!
I do like the intermediaries that have popped up, like PubPeer. I highly recommend that everyone get the extension as it adds context to many different articles.
This kind of thing needs to be started by universities and/or research institutes. Not the code part, but the organising the first journals part. It's going to get nowhere without establishment buy-in.
Citation count is a shoddy metric for a paper's quality. Not just because there's citation cartels, but because the reason stuff gets cited is not contained in the metric. And then to top it all off as soon as a metric becomes a target, it ceases to be a metric.
Print to PDF might just convert the PDF into Postscript instructions and back again without the original PDF’s metadata, but that probably depends on the Print to PDF software being used and its settings.
Probably won't take off because scientists need reputable journals and not some random fediverse publishers.
Is it fucked up? Absolutely. But something else needs to be changed before this would be possible.
Also, why not ditch the concept of a "publisher" to begin with? Why not have a national or international article index, graded by the article level? It's not that we live in a paper era, and for those who still need it, we can always print.
Institutions could easily form their own journals. National organizations that provide grants could also require you to publish in their journal. Universities can run their own journals. These sorts of entities already exist and provide article access for free, publishing in them would just need to be normalized.
These are just a few options without researchers organizing anything for themselves.
Well, we could assign the reviewers more "significance" here.
We could give them points and if they "upvote" a paper it gives the paper a bit more visibility/reputation. If the reviewer has actually reviewed the paper it gives the paper more points.
How much a reviewer is able to "spend" could be based on the reputation of the institution, their own papers in the same field and the points they get for their reviews by other users.
I kind of assume this with any digital media. Games, music, ebooks, stock videos, whatever - embedding a tiny unique ID is very easy and can allow publishers to track down leakers/pirates.
Honestly, even though as a consumer I don't like it, I don't mind it that much. Doesn't seem right to take the extreme position of "publishers should not be allowed to have ANY way of finding out who is leaking things". There needs to be a balance.
Online phone-home DRM is a huge fuck no, but a benign little piece of metadata that doesn't interact with anything and can't be used to spy on me? Whatever, I can accept it.
It can be used to spy on any decent scientist who will send papers his/hers/theirs institution has access to, but their friend doesn't. Much fun. As a reminder, publishers don't pay reviewers, don't pay for additional research, editing is typically minimal, and research is funded publicly, so what they own is social capital of owning big journal
It can be used to spy on any decent scientist who will send papers his/hers/theirs institution has access to, but their friend doesn't.
By "spy" I mean things like: know how many times I've read the PDF, when I've opened it, which parts of it I've read most, what program I used to open the PDF, how many copies of the PDF I've made, how many people I've emailed it to, etc. etc. etc.
This technique can do none of that. The only thing it can do is: if someone uploads the PDF to a mass sharing network, and an employee of the publisher downloads it from that mass sharing network and compares this metadata with the internal database, then they can see which of their users originally downloaded it and when they originally downloaded the PDF. It tells them nothing about how it got there. Maybe the original user shared it with 20 of their colleagues (a legitimate use of a downloaded PDF), and one of those colleagues uploaded that file to the mass sharing site without telling the original downloader. It doesn't prove one way or the other. It's an extremely small amount of information that's only useful for catching systemic uploaders, e.g. a single user who has uploaded hundreds or thousands of PDFs that they downloaded from the publisher using the same account.
And a savvy user can always strip that metadata out.
As a reminder, ...
All true, and fucked up, but it's not related to what I was talking about. I was talking about the general use of this technique.
Doesn’t seem right to take the extreme position of “publishers should not be allowed to have ANY way of finding out who is leaking things”. There needs to be a balance.
Nah, fuck that; that's both the opposite of an extreme position and is exactly the one we should take!
Copyright itself is a privilege and only exists in the first place "to promote the progress of science and the useful arts." Any entity that doesn't respect that purpose doesn't deserve to benefit from it at all.
You are arguing that Elsevier shouldn't exist at all, or needs to be forcibly changed into something more fair and more free. I 100% agree with this.
But my point was in general, not about Elsevier but about all digital publications of any kind. This includes indie publications and indie games. If an indie developer makes a game, and it gets bought maybe 20 copies but pirated thousands of times, do you still say "fuck that" to figuring out which "customer" shared the game?
I agree with "fuck that" to huge publishers, and by all means pirate all their shit, but smaller guys need some way to safeguard themselves, and there's no way to decide that small guys can use a certain tool and big guys cannot.
It would be pretty trivial for a script to automatically detect and delete tags like this, I would think. Diff two versions of the file and swap all diff characters to any non-display character.