Most of the data is scraped, it's not up to the website.
It is up to whoever runs the ai, and those are the people I'm addressing for the most part, though plenty of websites do have control over what data is fed to the ai they're using. In grammarly's case it's absolutely up to them what data is used and whether there's an option provided to opt out of having your work used for training the ai, as shown by the fact that they offer it to the business license. They just choose not to offer that option to other users.
You can't give a list of citation since it isn't a search engine, it doesn't know where the information comes from and it's highly transformative, it melds information from hundreds if not thousand of different sources.
It's all code, the people coding it are 100% capable of programming it to keep track of where the information comes from. Even if it's transformative, that doesn't prevent it from keeping track of what was transformed. I'm aware that the number of citations would be extensive, I'm fine with that.
If it worked only with volunteer work, there would simply be not enough data.
According to who? There are plenty of ways to get data from voluntary sources just like we get for any number of studies. It's just up to the one who runs the ai to put in the legwork to get enough data that way, and there are lots of methods. You don't have to just sit and wait for people to come to you and sign up, though based on the ai frenzy I bet they could have gotten plenty of data that way from people who are curious and want to contribute to ai training as a novel new concept. Making ai data gathering on websites something people can opt in or out on is just one way of making it more ethical than forcibly taking that data without permission.
Any law restricting data use in AI is only going to benefit corporations,
I fail to see how requiring permission and offering the option to opt out of having your data used would benefit corporations. That just sounds like an excuse to not even try to regulate them.
You can let them opt out, but then you need to do the same for whole websites which leads to a corporate hellscape where three companies own our whole economy since they are the only ones who can train ais.
I don't understand how part A leads to part B here. Why would those corporations have an advantage just because everyone with ais, including them, have to offer the option to opt out? Also, it's entirely possible to also restrict the scope of an ai or regulate ai monopolies alongside regulating stuff like basic consent. Historically a lack of regulation is what causes corporate hellscapes because without something keeping them in check the larger companies will take advantage of their reach to do whatever they want on a larger scale, pushing out or merging with competitors. It's not like requiring permission and providing opt-out would give them more of an advantage than they already have.