The kinds of AIs that we have today can't really conceptualize a world outside the texts they process
The LLMs we have today process "tokens", which can represent anything. That they happen to look "more intelligent" to humans when used as "text goes in, text comes out", is a purely human bias, not a limitation of the AI.
Don't be mistaken, LLMs can process, conceptualize, and output, anything that can be represented with a token, including the initial, intermediary, or final states of other AIs, for which even humans lack a token/word. That's how multimodal AIs with plugins work right now.
Using text (with or without emojis) as an external input/output system, is just a way to interact with humans, other AIs designed to input/output text, and to feedback (reflect) on themselves.