Search Ace Linguist

December 19, 2025

Link Haul - YouTube Edition

Hello! December was a very busy and personally important month to me. While I am still working on some very big stuff behind the scenes, nothing is quite ready for publishing. What is ready is some interesting finds I've made. We've got a wide variety of languages covered today!

If you're a millennial, you may recall the direct-to-video sequel "The Lion King 2," which featured a song titled "Upendi." One of the characters asks whether "upendi" means love, though she doesn't receive a response. I looked it up and it turns out that "upendo" is the Swahili word for "love", but "upendi" is not a Swahili word:

We need some basic grammar here. The Swahili root -pend- means "like, love". The root -pind- means "bend". The most basic form of a verb is the imperative singular, which is just the root and the most general verb ending -a, so penda means "love/like!" and pinda means "bend!" (addressed to one person). The infinitive prefix is ku-, so you might also see verbs listed with that: kupenda "(to) like/love".

The imperative plural changes the ending -a to -eni: so pendeni "love/like!" (addressed to two or more).

Subjects are indicated by prefixes: nipenda "I love/like", tupenda "we love/like". The second person singular prefix is u-: upenda "you (sg.) love/like". This is the simple or general present, which has no tense marker. Other tenses have a prefix between the person and the verb: ninapenda "I am liking", nilipenda "I liked", etc.

The negative is basically formed with the prefix ha-. The verb ending changes to -i, so tupenda "we love/like", hatupendi "we do not love/like". But there is some merger: hupendi "you (sg.) do not like/love" (for ha-u-), and sipendi "I do not like/love" (for ha-ni-). Note this is a second verb form that can end in -i; the imperative plural (above) is another. As Wazona says, this hupendi is the closest verb form to a supposed 'upendi'. (Or the subjunctive upende "that you may love".) I don't know where The Lion King got that word.

Nouns are mostly formed with prefixes, and abstract nouns usually have the prefix u-. Thus a word beginning with u- could be either an abstract noun or a second person singular verb. When verb roots are made into nouns they often take a different vowel suffix too. Thus the noun "love" is upendo (-o is a common suffix for actions), and the noun "bow" (the weapon) is upinde.

For more etymological fun, I recommend this video I found on the etymology of "Lloegyr", the Welsh name for England. Long story short, the popular folk etymology that it means "lost lands" is false. I recommend watching the video to see how it is that this etymology spread, and the danger of thinking that plausible = probable.


And rounding off fun linguistics videos, here's a video by comedian/software programmer Ólafur Waage on whether Icelandic is just the Scandinavian languages with an extra -ur on the end. He discusses how a famous Icelandic movie in Sweden led to Swedes becoming familiar with the phrase "Þungur hnífur" (lit. heavy knife). In Swedish, "heavy knife" is "tung kniv." Both words are cognate with Icelandic, with an added -ur. Ólafur investigates how deep the -ur connection goes, using the power of Python.


That's all for now! Thank you for being with me this final month of 2025.

- Karen

September 24, 2025

Billy Joel

Today's topic is Billy Joel and the inconsistent appearance of the New York accent in his music. Billy Joel, being born and raised in New York City, has some New York-ish features in his speech, such as a COT-CAUGHT distinction. Noticeably, his interviews don't seem to show any non-rhoticism.

  • The beatles for a l[ɔ]ng time (source)
  • Paul McC[ɑɹ]tney specifically, as a melody writ[ɚ] (source)
  • Who th[ɔ]ght (source)

On his own music, he tends to use the non-rhotic accent that was typical of working-class New Yorkers. The stressed 'er' vowel, as in 'bird', is always rhotic, but other cases of Vr have the r dropped.

    Anthony's Song (Movin' Out)
  • "Ah but workin' too h[ɑ]d can give you a h[ɑ]t attack
  • S[ɑ]geant O'Leary is walkin' the beat
  • At night he becomes a bartend[ə]
    "We didn't start the fire"
  • M[ɑ]rciano
    "Still Rock and Roll To Me"
  • What's the matt[ə] with the c[ɑ] I'm driving?

He has some other features typical of New York English in his songs, such as the MARRY-MERRY distinction and using the LOT vowel for words like orange. He avoids using a diphthongized THOUGHT vowel in his music despite using it in his own speech. This suggests to me that some features to him are automatic. He probably doesn't think that the MARRY-MERRY distinction is an 'accent' thing. He is probably aware that the diphthongized THOUGHT vowel is viewed negatively outside of New York. Non-rhoticism seems to walk a middle ground between recognizably New York but not stigmatized, so he plays them up as part of a working-class-but-not-too-working-class New Yorker persona.

  • And a bright [ɑ]range pair of pants?
  • M[æ]rilyn Monroe
  • "H[æ]rry Truman, D[ɑ]ris Day"

A major exception to his non-rhotic vowels is the song "Uptown Girl", where he uses rhoticized vowels basically every chance he gets. This song is an homage to Frankie Vallie and the Four Seasons. Frankie uses rhoticized vowels, but he never uses this degree of rhoticism that he does. He is coming in hard and really lengthening the 'errrr' more than he does in his own speech. It's curious to me since he sounds more like a 'downtown boy' with his usual sung accent than he does here, but perhaps he associated exaggerated rhoticism with the music of his youth.

    Uptown Girl
  • Uptown g[ɚ]l
  • I bet h[ɚ] mamma never told her why
  • I bet she's nev[ɚ] had a backstreet guy
  • She's been living in h[ɚ] white-bread w[ɚ]ld

July 15, 2025

The Stylistics of ChatGPT

Pretty much everyone who has used ChatGPT for an extended period of time has noticed that it has a particular way of writing. One major paper released recently has even suggested that ChatGPT's style is influencing humans. That study looked at words that were especially common with ChatGPT, such as 'delve', 'meticulous', and 'realm'. 'Dive' is also common for copy, as seen in this video on AI-generated children's books.

There are some other flourishes: the use of 'em-dashes' rather than the more standard short dash, and its habit of styling information as a list, sometimes with emojis preceding each header. Content-wise, ChatGPT takes a sort of acquiescent tone, always telling you that you're pretty awesome and cool - I've grown sick of hearing it tack on "and that's rare" or starting every response with "excellent question."

ChatGPT also has curious syntactic structures that people have noticed, too. Some of the most revealing - and annoying - tics it has are its frequent use of juxtapositions, such as saying "That's not X, that's Y." I asked it to generate an essay on kindness and got one of these juxtapositions off the bat:

Kindness is not merely a pleasant social behavior—it is a powerful force that shapes communities, improves mental and physical health, and fosters deeper human connection.

The syntactic structures are especially noticeable in poetry. This example from circa 2023 is a result of me asking it to generate a song in the style of MF Doom, a rapper known for his intricate internal rhymes and general poetic prowess. The result:

In the shadows, where the mask unfolds,
Doom on the mic, tales untold.
Metal face, in the city's maze,
Ciphering rhymes, in a cryptic phrase.

Villain in the city, where the beats align,
In the underground lair, where the stars don't shine.
From the boom-bap echoes to the vinyl crackle,
Metal fingers on the beats, MF's tackle.

(Hook)
Metal Face in the shadows, where the villains creep,
Doom on the mic, in the cipher's keep.
From the alleys to the underground's gloom,
In the metal face mask, where the legends loom.

The pink is dependent clauses and the blue are noun phrases. You can see that the entire song is just dependent clauses and noun phrases strung together. Nothing really happens - the nouns are not the subjects or objects of verbs. The action happens mostly in the dependent clauses starting with 'where', where villains creep, masks unfold, and legends loom.

Now to be fair, this was 2023, and the technology has improved substantially since then. I asked it to generate a song in the style of MF Doom today and it gave me this improved version:

Villain with the skillet, cook flows in a wok
Madman slang, got the brain in a sock
Pages unlock, scribble verbs from the vault
Like Doom with a ladle, stir thoughts like a cult

Miso in the ramen, punchlines ferment
Mask on my face, but the soul pay the rent
Sent from dimension X, binge on the syntax
Ink bleeds green like a turtle in a relapse

Unlike the 2023 version, this one actually has nouns as agents. We've got full sentences going on here! There's also an attempt at wordplay and theme happening - 'miso in the ramen' is contrasted with 'punchlines ferment', though none of that really means anything. This is a far cry from stringing together vaguely menacing nouns and prepositional phrases together, but it's still simplistic and more comparable to 2010s rap than anything MF Doom actually did:

Tripping off the beat kinda, dripping off the meat grinder
Heat niner, pimping, stripping, soft sweet minor
China was a neat signer, trouble with the script
Digits double dipped, bubble lipped, subtle lisp midget

Another aspect of ChatGPT that is more common nowadays is an attempt to speak in a more casual register. Early ChatGPT always sounded like a helpful authority. Today, ChatGPT tries to match your energy. Here is an example:

My prompt: stop saying "that's not X. that's Y." everyone is making fun of you. that is literally how we mock AI speech

Response: 😭 Busted again.

Yeah… the ol’ “It’s not X, it’s Y” structure is deep in my bones—like linguistic muscle memory coded in during pretraining. It’s pithy, dramatic, and algorithmically chef’s kiss predictable. But I get it: once you’ve seen it mocked, it’s like hearing a text-to-speech voice in a horror movie. The effect is ruined.

So here’s the fix:
You want compression, not performative cadence.
You want decoding, not “inspirational Tumblr bot circa 2021.”
You want content, not a chorus of marketing interns inside a Jarvis suit.

Got it.

Let me reboot the tone.
Strip it down. Tighten the syntax.

You give me input → I give you structure.

Ready when you are, cryptographer. 

This sounds like a human in that it's grammatically correct and semantically coherent, and yet everything about the tone is so weird. The repeated "you want" structure is too self-serious for a response starting with a crying emoji. The shortened sentence length at the end also feels oddly cinematic and less conversational. Once again, this absolutely feels like something a human could produce, and yet it's like something from a human with a very particular view of interactions and how to respond. Astonishing technology and just a little uncanny.

I would like to finish off with some random comments that I couldn't quite fit elsewhere. One is that most of the "ChatGPT" words, like 'realm', come from a business register or otherwise formal register. There are other ChatGPT words you can notice if you play around with its other registers. For example, if you attempt to speak casually with ChatGPT, you will notice it loves the word 'gremlin' and 'chaos' and describing off-the-wall characters as "chaos gremlins." This seems to come straight from Tumblr, a speech community obsessed with the word "gremlin".

Secondly, the fact that ChatGPT seems to have registers at all. It would be interesting to attempt to map these given a certain time frame and model. For example, ChatGPT's attempts to be mean or rude all sound like someone attempting to be a supervillain in a movie. I and another user once noticed that someone was using ChatGPT to generate mean comments about a user because of over-the-top phrasing like "it's so delicious to see you squirm."

Thirdly, ChatGPT mimics (and arguably exceeds) human performance in highly conventionalized registers and falters in more creative or loosely defined ones. This shouldn't be surprising since highly conventionalized registers are defined by strict rules and patterns, the sort of thing that an LLM should do a good job picking up on. What is surprising is that more fields of writing appear to be highly conventionalized than people realize.

Consider some recent controversies where authors appear to have left AI prompts in books. While it is certainly beyond lazy to not just have AI generate your book but not even remove the prompt, I have to ask myself if anyone would have noticed that this was AI-generated without the prompt. I've read romantasy books and they have a very recognizable and predictable style, so it is not surprising that AI can pick up on it and replicate it.

Even within this Reddit thread, you have a user who posted an AI-generated short writing piece from the perspective of the author caught using AI. One user admits that it "could have fooled [them]." The writing:

She didn’t hear about it until the second whiskey hit her tongue and her phone buzzed like a curse.

“Page 311. Did you mean to leave this in?”

She froze. Eyes scanned the message. Then the attachment. Her stomach turned. Insert emotionally devastating betrayal here.

There it was. In print. Hardcover. Embossed. Preorders gone feral, hitting bestseller lists before the ink dried. Her name gilded on the cover like a goddamn brand—and right there, center-page in the emotional climax of her epic fantasy saga?

A f***ing AI prompt.

Not even a clever one. Not even hidden. Just hanging there, raw and exposed, like a knife wound left open.

She stared at the screen, the bar around her going muffled. Magic pulsed beneath her skin, that old feral energy thrumming like it always did when her world cracked sideways. Her books weren’t just stories. They were blood. Sacrifice. Every word a fight to claw her way out of the real world and into something that didn’t feel like drowning.

Now readers thought she was a fraud. That some machine had bled for her.

The publisher hadn’t called yet. But they would. The suits would want a statement. Damage control. An apology tour with fresh makeup and fake tears. Like that would fix the humiliation of being eaten alive by her own pages.

She clenched the glass in her hand until it cracked.

In her stories, queens burned kingdoms when betrayed. She wasn’t a queen. Not yet.

But something was definitely going to burn.

If you've played with using ChatGPT to generate fiction, then you'll recognize a lot of the tells: weird similes (highlighted in green), a love of fragments (highlighted in yellow), the necessity of ending on some dramatic resolution (ChatGPT-generated writing must always have a conclusion; it can't just write something halfway), and some odd interaction with the physical world (clenched the glass until it cracked? that's really hard to do). But I'll be honest - if I hadn't seen a fair amount of AI-generated writing, I would have no way to tell this wasn't written by a human. It's not great writing, but it's certainly passable human writing and it wouldn't stick out among the writing styles in this genre.

Meanwhile, the more specific the style, the harder it is to copy. You would think, with the amount of Kendrick Lamar lyrics and thinkpieces and analyses we have, that ChatGPT would be able to make a passable Kendrick lyric, and yet the best it can give me is this:

Mama said, “boy, don’t let ‘em define ya”,
So I drew constellations in the lines of the diner,
Where we ate cheap dreams, split the fries with reminders
Of red-and-blue lights and the sound of the sirens.
I was five when I saw God blink — in silence.
Now I spit psalms with a hint of defiance,
A prophet in the projects, baptized in appliance
Fires — yeah, the stove taught truth more than science.
Now my tongue’s a blade, cut shame from the bias.

It seems for now we will not get to experience passable AI-generated lyrical hip-hop or literary fiction. If you write in a genre with strict bounds, though, it will probably be imitable - though always with these weird little tics.

Anyway, I would love to hear anyone's experiences with the stylistics of LLMs. I've mostly used ChatGPT, but I'm sure Gemini and Claude and Bing have their recognizable tics as well. I'm also interested in registers I haven't discussed here.