You are correct, CLIP can misinterpret things, which is where human intelligence comes in. Having CLIP process the probabilities for the terminology that you describe what you are looking for then utilizing a bit of heuristics can go a long way. You don't need to train it to recognize a nude child because it has been trained to recognize a child, and it has been trained to recognize nudity, so if an image scores high in "nude" and "child" just throw it out. Granted, it might be a picture of a woman breastfeeding while a toddler looks on, which is inherently not child pornography, but unless that is the specific image that is being prompted for, it is not that big of a deal to just toss it. We understand the conceptual linking so we can set the threshold parameters and adjust as needed.
As for the companies, it is a tough world surrounding it. The argument of a company that produced a piece of software being culpable for the misuse of said software is a very tenuous one. There have been attempts to make gun manufacturers liable for gun deaths (especially handguns since they really only have the purpose of killing humans). This one I can see, as the firearm killing a person is not a "misuse", indeed, it is the express purpose for it's creation. But what this would be would be more akin to wanting to hold Adobe liable for the child pornography that is edited in Lightroom, or Dropbox liable for someone using Dropbox API to set up a private distribution network for illicit materials. In reality, as long as the company did not design a product with the illegal activity expressly in mind, then they really shouldn't be culpable for how people use it once it is in the wild.
I do feel like more needs to be done to make public the training data for public inspection, as well as forensic methods for interrogating the end products to figure out if they are lying and hiding materials that were used for training. That is just a general issue though that covers many of the ethical and legal issues surrounding AI training.