Skip Navigation

InitialsDiceBearhttps://github.com/dicebear/dicebearhttps://creativecommons.org/publicdomain/zero/1.0/„Initials” (https://github.com/dicebear/dicebear) by „DiceBear”, licensed under „CC0 1.0” (https://creativecommons.org/publicdomain/zero/1.0/)C
Posts
6
Comments
146
Joined
3 yr. ago

  • Why is Microsoft canceling a Gigawatt of data center capacity while telling everybody that it didn’t have enough data centers to handle demand for its AI products? I suppose there’s one way of looking at it: that Microsoft may currently have a capacity issue, but soon won’t, meaning that further expansion is unnecessary.

    This is precisely it. Internally, Microsoft's SREs perform multiple levels of capacity planning, so that a product might individually be growing and requiring more resources over the next few months, but a department might be overall shrinking and using less capacity over the next few years. A datacenter requires at least 4yrs of construction before its capacity is available (usually more like 5yrs) which is too long of a horizon for any individual product...unless, of course, your product is ChatGPT and it requires a datacenter's worth of resources. Even if OpenAI were siloed from Microsoft or Azure, they would still know that OpenAI is among their neediest customers and include them in planning.

    Source: Scuttlebutt from other SREs, mostly. An analogous situation happened with Google's App Engine product: App Engine's biggest users impacted App Engine's internal capacity planning at the product level, which impacted datacenter planning because App Engine was mostly built from one big footprint in one little Oklahoma datacenter.

    Conclusion: Microsoft's going to drop OpenAI as a customer. Oracle's going to pick up the responsibility. Microsoft knows that there's no money to be made here, and is eager to see how expensive that lesson will be for Oracle; Oracle is fairly new to the business of running a public cloud and likely thinks they can offer a better platform than Azure, especially when fueled by delicious Arabian oil-fund money. Folks may want to close OpenAI accounts if they don't want Oracle billing them someday.

  • Look, I get your perspective, but zooming out there is a context that nobody's mentioning, and the thread deteriorated into name-calling instead of looking for insight.

    In theory, a training pass needs one readthrough of the input data, and we know of existing systems that achieve that, from well-trodden n-gram models to the wholly-hypothetical large Lempel-Ziv models. Viewed that way, most modern training methods are extremely wasteful: Transformers, Mamba, RWKV, etc. are trading time for space to try to make relatively small models, and it's an expensive tradeoff.

    From that perspective, we should expect somebody to eventually demonstrate that the Transformers paradigm sucks. Mamba and RWKV are good examples of modifying old ideas about RNNs to take advantage of GPUs, but are still stuck in the idea that having a GPU perform lots of gradient descent is good. If you want to critique something, critique the gradient worship!

    I swear, it's like whenever Chinese folks do anything the rest of the blogosphere goes into panic. I'm not going to insult anybody directly but I'm so fucking tired of mathlessness.

    Also, point of order: Meta open-sourced Llama so that their employees would stop using Bittorrent to leak it! Not to "keep the rabble quiet" but to appease their own developers.

  • Elon is an Expert Beginner: he has become proficient in executing the basics of the craft by sheer repetition, but failed to develop meaningful generalizations.

    The original Expert Beginner concept was defined here in terms of the Dreyfus model, but I think it's compatible with Lee's model as well. In your wording of Lee's model, one becomes an Expert Beginner when their intuition is specialized for seeing the thing; they have seen so many punches that now everything looks like a punch and must be treated like a punch, but don't worry, I'm a punch expert, I've seen so many punches, I definitely know what to do when punches are involved.

  • There's a good insight from this armchair psychoanalysis. The typical narcissist is technically capable of performing the whole pretend-to-care-for-game-theoretic-reasons behavior, provided that there is an incentive for them. However, if Elon genuinely believes himself to be Christ or Buddha or Roy, then his abilities don't matter, because he will never have the incentive to deflate his beliefs and face his own limitations and mortality. In short, Elon's attitude can't be adjusted and his mental health will never improve.

  • It's almost completely ineffective, sorry. It's certainly not as effective as exfiltrating weights via neighborly means.

    On Glaze and Nightshade, my prior rant hasn't yet been invalidated and there's no upcoming mathematics which tilt the scales in favor of anti-training techniques. In general, scrapers for training sets are now augmented with alignment models, which test inputs to see how well the tags line up; your example might be rejected as insufficiently normal-cat-like.

    I think that "force-feeding" is probably not the right metaphor. At scale, more effort goes into cleaning and tagging than into scraping; most of that "forced" input is destined to be discarded or retagged.