I think that makes a lot of sense and it's exactly the kind of stuff we should be considering at this stage. I also agree that humans are the ideal source of empathy and the best way to get around systems of secret code words and other methods that are used to circumvent algorithmic control.
But I also think AI-generated algorithms have their place. By design, content moderation is an unpaid task. Many volunteers are very good at moderation, but the work takes up a lot of their time and some of the best minds may decide to step away from moderation if it becomes to burdensome. On reddit, I saw a lot of examples of moderators who, as flawed humans, made choices that were not empathetic, but rather driven by a desire for power and control. Of course, if we make mistakes during the algorithm training process and allow our AI to be trained on the lowest common denominator of moderators, the algorithm may end up being just as power hungry - or even worse, considering that bots do not ever tire or log off.
But I do think there are ways to get past that, if we're careful about how we implement such systems. While depending on your definition, bots may not be capable of empathy, based on some conversations with AI chatbots, I think AI can be trained to very closely simulate empathy. But as you mentioned about secret messages, bots will likely always be behind the curve when it comes to recognizing dog whistles and otherwise obfuscated hate speech. But as long as we always have dedicated empathetic humans taking part, the AI should be able to catch up quickly whenever a new pattern emerges. We may even be able to tackle these issues by sending our own bots into enemy territory and learning the dog whistles as they're being developed, though there could be negative side effects to this strategy as well.
I think my primary concern when pushing for these kinds of algorithms is to make sure we don't overburden moderation teams. I've worked too long in jobs where too much was expected for too little pay, and all the best and brightest left for greener pastures. I think the best way to make moderation rewarding is to automate the most obvious choices. If someone is blasting hate speech, a bot can be very certain that the comment should be hidden and a moderator can review the bot's decision at a later time if they wish. I just want to get the most boring repetitive tasks off of moderators' plates so they can focus on decisions that actually require nuance.
Something I really like about what you said was the idea of promoting choice. I was on a different social media platform lately, one which has a significant userbase of minors and therefore needs fast over-tuned moderation to limit liabilities (Campfire, the communication tool for Pokémon Go). I was chatting with a friend and a comment I thought was mundane got automatically blocked because it contained the word "trash." Now, I think this indicates they are using a low quality AI, because context clues would have shown a better AI that the comment was fine. In any case, I was immediately frustrated because I thought my friend would get the impression that I said something really bad, because my comment was blocked. Except I soon found out that you can choose to see hidden comments by clicking on them. Without the choice of seeing the comment, I felt hate towards the algorithm. But when presented with the choice of seeing censored comments, my opinion immediately flipped and I actually appreciated the algorithm because it provides a safe platform where distasteful comments are immediately blocked so the young and impressionable can't see them, but adults are able to remove the block to see the comments if they desire.
I think we can take this a step further and have automatically blocked comments show categories of reasons why they were blocked. For example, I might never want to click on comments that were blocked due to containing racial slurs. But when I see comments blocked because of spoilers, maybe I do want to take a peek at select comments. And maybe for general curse words, I want to remove the filter entirely so that on my device, those comments are never hidden from me in the first place. This would allow for some curating of the user experience before moderators even have a chance to arrive on the scene.
On the whole, I agree with you that humans are the ideal. But I am fearful of a future where bots are so advanced, we have no way to tell what is a human account and what is not. Whether we like it or not, moderators may eventually be bots - not because the system is designed that way but because many accounts will be bots and admins picking their moderation staff won't be able to reliably tell the difference.
The most worrisome aspect of this future, in my mind, will be the idea of voting. A message may be hidden because of identified hate speech, and we may eventually have an option for users to vote whether the comment was correctly hidden or if the block should be removed. But if a majority of users are bots, a bad actor could have their bot swarm vote on removing blocks from comments that were correctly hidden due to containing hate speech. Whether it happens at the user level or at the moderator level, this is a risk. So, in my mind, one of the most important tasks we will need AI to perform is identifying other AI. At first, humans will be able to identify AI by the way they talk. But chatbots will become so realistic that eventually, we will need to rely on clues that humans are bad at detecting, such as when a swarm of bots perform similar actions in tandem, coordinating in a way that humans do not.
And I think it's important we start this work now, because if the bots controlled by the opposition get good enough before we are able to reliably detect them, our detection abilities will always be behind the curve. In a worst case scenario, we would have a bot that thinks the most realistic swarms of bots are all human and the most fake-sounding groups of humans are all bots. This is the future I'm most concerned about heading off to make sure it doesn't happen. I know the scenario is not palatable, and at this stage it may feel better to avoid AI entirely, but I think bots taking over this platform is a very real possibility and we should do our best to prevent it.