the electricity bill for each query – to power the servers and their chillers – would still make running these giant models very expensive.
This assumes there won't be radical advances in cost-effective hardware to run the queries.
AI proponents work precisely towards such advances: hardware tailored to running the best performing models, at far lower costs than current GPUs and GPU derivates.
Something like "execute in RAM" neural network accelerators, could reduce query costs by several orders of magnitude.
The fraud of the cryptocurrency bubble was far more pervasive than the fraud in the dotcom bubble, so much so that without the fraud, there’s almost nothing left.
Ironically, what the crypto bubble left behind, was a surplus of GPUs, which got repurposed for AI... and just like crypto left GPUs to move onto purpose-built ASICs and other models like PoS instead of PoW, so does AI need to leave GPUs and move onto purpose-built hardware with better models (quantized NNs are a good example).
As a tangent, we could talk about how the gaming industry has enabled a GPU industry that in turn has enabled these off-shots.