|
|
Ode to a Spell Checker
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
On the "serious" side:
As a hobby, I have for the last few years collected Norwegian words that would cause trouble for Norwegian speech synthesis: One spelling can be pronounced in two or three different ways. (I never found one with four pronounciations, but they might exist.)
Sort of like steel guitar vs. lead guitar - they are both metals, right?
Where can I find a collection of English homographs that do not differ only in meaning, but also in pronounciation, causing speech synthesis problems?
|
|
|
|
|
You just taught me the word homograph and you're asking this question? An online search?
Wind is an easy one, though it might be disambiguated by context (noun or verb). And separate (adjective or verb). And...no! You'll end up keeping me here all day!
I could probably survive in Norwegian, so it would be interesting to see some of your words.
|
|
|
|
|
A few Norwegian examples:
Take "planet" as a 3-pronounciation case: Stress on the second syllable it is a planet (such as the earth). Stress on first syllable: The horizontal plane. That special Norwegian double-stress: When the motorboat laid down to soar over the waters, not stalling any more.
Or "urene". The watches (first syllable stress) that might become dirty (u-rene, non-clean) when you climb the scree slopes (doube stress).
Or "rosen": I believe the name Rosén is an imported one, probably from French (but the accent is far from always included in writing). If Mr. Rosen give you "rosen" that you deserve, it could either be the rose flower (double stress) or the praise (fist syllable stress).
You could send a message to your slaughter: "Lever lever, om du lever" (Deliver liver, if you are alive) - the fist "lever" with second syllable stress, the second with short "e" and equal stress, the third with long "e" and first syllable stress.
"Kvitter", first syllable stress, is the sound of birds. Second syllalbe stress: Sign it! Double stress: Get rid of, unload (verb in present tense)
"Hva koster koster på Koster?" could either be read as "What is the price of brooms at Koster?" (the Swedish islands) or "What does brooms at Koster brush away?", depending on the vowel sound of the first "koster". In either case, the two first "koster" has double-stress, the last one fist-syllalbe stress.
There are not that many triple-pronounciation words.
A number vary in the sound of certain vowel sounds, like "bord": If the "o" is like an "å" sound (compare to English: bought), it is a border (in embroidery). With the clear "o" sound (rarely heard in English, like the physicist Niels Bohr), it is a table.
Many words and names have a French origin, often with stress on the last syllable, but has been recognized in Norwegian for generations, such as the name Andre (second syllable stress). "Ikke Andre, men han andre" (not Andre, but the other guy), the other one has stress on the first syllable. "Pioner" with last syllable stress is a pioneer, while second syllable stress is the peony (flower).
"Fordeler" with stress on the second syllable could either mean advantages (stress on first syllable) or distributor (stress on the second). So to tease those el-car fans, I had a T-shirt made that declared "El-biler har ingen fordeler", which may be read either as "El-cars have no distributor" or as "El-cars have no advantages".
Some composite words have identical spelling as non-composite words, so the meaning depends if you make a slight separation between them. Like "baksete" - back seat, or troublesome? "Forslag" - is that a propsal, or a kind of feed (for-slag)?
The (context dependent) interpretation of one word can affect the pronounciation of antother: "fiber i kosten" could refer to nutritional fibers, "kosten" with an "å" sound, or fibers in your brush ("kosten" with an "o" sound) - both interpretation could be valid, in different contexts.
Some words may be read with a long or short wovel sound. "Halt" with a short "a" means limp, but with a long "a" it means dragged. So "Gutten var halt, men han ble halt med" (the boy was limp, but he was dragged along) has different pronounciations. "Vi spurte om veien, og måtte spurte videre" - with a long "u", "spurte" means asked, with a short "u", it means sprint ("We asked for directions, and had to sprint on). "Han fikk salt hesten og ga den litt salt" (he has the horse saddled and gave it a little salt) - long "a" for saddled, short "a" for NaCl.
"Jeg er stolt av ham, har alltid stolt på ham" - I am pround of him, always trusted him" differs in both vowel sound and duration (proud: "å"/short, trusted: sharp "o"/long)
For some words, a consonant may be soft/disappering or sharp: "Linda" with a clearly pronounced "d" is a girl's name. If you prounouce it as if it were written "linna", it is the tree, linden.
If you pronounce the final "t" in "foret", is is the past tense of "to feed"; if you suppress the "t" it is a noun, the feed that you give the animals. This may be combined with different stress patterns - a number of first-syllable stress suppresses the final "t" (and is a noun), double-stress and pronounced "t" is verb in the past tense. But not without exception, of course...
Norwegian dialects vary a lot, and some words have identical pronounciations in some dialects, different in others. "Overlegen", the head doctor or to be autocratic: In south Norway dialects, the last syllable is pronounced with clear "eeh" sound in both meanings. In north Norway, the head doctor is referenced with a clear "æ" sound, even sharper than the initial wovel of English "any". In some dialects, "tomt" has the same pronounciation for both meanings "empty" and "patch of land", in other dialects, empty is with an "å" sound, patch of land with an "o" sound.
The great majority of the "troublesome words" are those where one interpretation has that particular double stress pattern, not known in many languages. If you are into music, the best way to get a grip on it is to think of a double upbeat, like if you start to sing "Oh say can you see", but stop immediately after the "Oh-o". Usually, the two forms have a common root, with the double stress being the either a passive form or past tense of a word, first syllable stress being the noun, like "Reven var buret inne i buret" (the fox was caged in the cage). We have got hundreds of those pairs in Norwegian.
To illustrate the use of the words, and provide something that a speech generator could extract context / semantics from, I collect these words not as a plain list, but as a prose text (which has no intention of literary qualities; it is just to put the words into sentences). If you would like to practice your Norwegian, I'll send you my text file. But for a non-native Norwegian speaker, I guess reading it out loud without thorough preparation would be comparable to "English is tough stuff" (which I assume that you know - if not, google it!). If you are uncertain about the pronounciation, I'll gladly assist you!
|
|
|
|
|
Thanks. I think my mother, who grew up in Kongsberg, will enjoy this.
Putting that accent on e's (Rosén) is fairly common in Sweden. When I first saw it, I thought it was an affectation.
Dialects indeed. Many years ago, I was sitting outside a cafe and thought the people at a nearby table were German tourists. After a while, I realized that they were speaking Norwegian. From Bergen, so it must date back to the Hanseatic League. I've also heard that visiting Icelanders thought folks on some outer, northern Norwegian islands spoke Icelandic a bit strangely!
|
|
|
|
|
Greg Utas wrote: I think my mother, who grew up in Kongsberg, will enjoy this. Drønn fra fjellet, sus fra skogen
vekker bergstaden ved Lågen.
Arbeidslivets våpengny
får i fosselarmen ly.
Det er Kongsberg! Det er Kongsberg!
Sølvomspunnet -
<*> <*> <*>
Byen hvortil vi er bundet.
I am sure your mother will teach you the tune
... If you are going to teach kids "The Kongsberg song" today, they could benefit from a history lesson or two ... The last silver mine was closed down some seventy years ago; noone living today has heard those "drønn fra fjellet" (booms from the mountain) or seen the town as as "sølvomspunnet" (spun in silver). The arms factory, for 150 years Kongsberg's cornerstone, was dissolved more than thirty years ago. (Among the scraps there are still some arms activity, but lots of it has actually moved to other places.) What is today a hydropower dam in the river was once a thundering riverfall under the old bridge (Gamlebrua), where the fossekall (the national bird of Norway, the white-throated dipper) had its nest behind a curtain of falling water.
You can still hear the "sus fra skogen" (winds blowing in the forest), but the rest is long gone ... Nostalgia isn't what it used to be ...
|
|
|
|
|
I've never heard this and certainly had to look up some words. And at first I puzzled over vekker, reading it as "weeks", because I used to alternate between Swedish and English until I was about 4 years old.
|
|
|
|
|
|
|
Why you have been alternating between Swedish and English until the age of four.
If you have Swedish and english speaking parents you don't stop at the age of four.
And if you had english speaking parents, but lived in sweden, you would normally not gave been alternating that very much.
|
|
|
|
|
My father was born in Sweden and my mother in Norway. They met in Sweden and emigrated to Canada, where I was born. So I spoke both Swedish and English until deciding that I'd better stick with English before starting school! I'm now better at Norwegian but Swedish sometimes creeps in. While visiting Gotland, I chatted with a woman who thought my Swedish was just fine, but I told her she was just being kind!
|
|
|
|
|
Ah, that's a combination I didn't think of. No wonder you're so knowledgeable of Scandinavia.
|
|
|
|
|
A story about Swedish and Norwegian - I know it is true, because it was experienced by the father of a classmate of mine: He was "turistsjef" (manager of tourist oriented activities in our town), in charge of a pan-Scandinavian conference of people with similar activities:
If meet grown people from Norway's Sognefjord, they sometimes speak a dialect that "no" Norwegian can understand a word of (and this happened 40+ years ago, then dialects were far more pronounced). Sweden has the same "problem" with elderly people from the Skåne district; some of them have a dialect which is almost like a different language.
The distance from Sognefjord to Skåne is around 700 km in linear distance, at least a thousand km along the road, so there is not (and there has never been any) obvious direct communication paths between the two districts. Yet it turned out that the Sogn dialect deviations from standard Norwegian were so similar to the Skåne dialect deviations from standard Swedish, that the Sogn and Skåne representatives were chatting away, having no problems understanding each other. Other Norwegians, and other Swedes, standing ringside, didn't understand a word of what either of them were saying...
I guess that an essential part of the explantion is that both dialects had preserved essential elements from the same proto-Scandinavian Norse language of the viking ages, keeping alive a number of sounds, conjugations and inflection patterns that dissappeared from modern Swedish/Norwegian hundreds of years ago.
Most likely, both Sogn and Skåne dialects have been so watered out by the official national languages that today, you wouldn't have the same experience if Sogn and Skåne people met - they would understand each other, and be understood by the ringsiders, beacuase their languages would be far closer to the national "standard" lanaguages.
|
|
|
|
|
The Skåne dialect and pronunciation were quite influenced by Danish, which when spoken with a heavy native accent sounds like a throat disease, not a language. I heard a story about someone from Skåne asking a Stockholm resident for directions. They ended up using English to communicate.
When in Denmark, I could read things about as well as I can read Norwegian, but anything spoken was hopeless. On the other hand, my aunt married a Dane, and they sometimes spoke Danish. His accent was mild enough that I could understand them.
|
|
|
|
|
(Many) years ago I was cycle touring on my own in northern Norway, not long after living near Trondheim for 3 months. Got chatting to a local whilst awaiting a ferry. After a while she asked what part of Sweden I was from. Never having been "good" at languages this was the biggest compliment anyone could pay!
|
|
|
|
|
Now deceased comedian Harald Heide-Steen Jr. used to impersonate a Russsian submarine captain caught in Norwegain waters ("But we can't see that border underwater!"), speaking Norwegian with heavy Russian accent. One Russian language expert identified from this accent where Heide-Steen Jr. had learned is Russian, and was very surpriseed to learn that Heide-Steen Jr. didn't know a word of Russian. He had learned the accent without learning the language.
There are also a few Norwegian singer-songwriters who are extremely good at talking gibberish that doesn't sound like gibberish, usually either in Norwegian or English. Usually it starts out as meaningful words/sentences, but somewhere in the middle of it - you can't tell exactly where - you loose grip of it. It still is like the chatting you can hear from the neighbouring table at the café. It is the same language; you just cant make out the words.
That is tor of the opposite of what you describe, but I think it takes much of the same abilites, "having an ear for" (is that a valid way to phrase it in Englihs?) the language.
|
|
|
|
|
That's quite an achievement! You could probably have answered Jämtland and been believed.
|
|
|
|
|
|
Thanks for the hints.
The problem searching for homographs in a language not your native one is that often you do not know the two+ different pronounciations; you mispronounce some of the meanings. Actually, I did that myself once, laughing at this record sleeve identifying the lead (heavy metal) guitar - I honestly though that it was a joking way to say "heavy bass guitar". My friends laughed at my pun, and only much later did I discover what "lead guitar" really means (and how it is pronounced). My friends never discovered that I really made a fool of myself rather than making a pun ...
A non-native speaker may encounter exactly that problem I am addressing: If you feed a word into a speech synthesis module to learn its pronounciation, you usually are given one single alternative. So you might "learn" that the wind blowing or to wind your watch should sound the same way. To learn the difference, you need a dictionary that provides an IPA (phonetic alphabet) version for each meaning, so you can see the difference (if there is one). And then there are dialects - I am sure that most languages have dialects distinguishing between words that other dialects pronounce a single way. So, to make an exhaustive (or even extensive) list of homographs with with different pronounciation requires strong familiarity with the language.
|
|
|
|
|
"Led" guitar in a heavy metal band. Maybe they were influenced by Led Zeppelin.
|
|
|
|
|
I had such a list, collected with a willing helper, from when I worked for the US Department of Energy.
It started with 'wind' and 'lead' but there turned out to be a surprisingly large list of them.
Amazingly, one of the early version of the list was sitting under my monitor - right here!
project, live, polish, unionized, bass, sow, bow, content, record, present, close, minute, dove, use, combine, row, refuse, console, invalid, incense, flower, periodic, read
The majority have related meanings but are pronounced differently depending upon the context. Some are chemistry related (ionized and iodic). Others are totally mysterious until they have some context: is polish a nationality or the act of shining something with mild friction?
The main list had more of the fun ones, like wind, polish, sow, console, bass, &etc.
Another part of word fun is with very different meanings in alternate languages. We had a visiting professor from Brazil who went historical laughing when it was his first pay day - sound to him like a day dedicated to flatulence. "Exxon" was a word found by what passed for a computer search to find a word that meant nothing in any other language. It used to be "Esso". Chevrolet got into trouble with it "Nova" in Spanish speaking countries, translating to "no go"
Conclusion - and ancient wisdom: the only way to avoid trouble over what you say is to keep one's mouth shut.
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|
W∴ Balboos wrote: there turned out to be a surprisingly large list of them. Exactly like I experienced with Norwegian. I am still collecting, but I think I have caught most of them by now!
Thanks for your contributions. I guess I will start collecting English ones now - but I guess that for several of them, I will have to look up the proper pronounciation.
Another famous example of a product/company name explicitly chosen to have no meaning, in particular: offensive/derogatory, in any known language, is "Kodak".
In translation between languages you frequently encounter "false friends". If a Norwegian refers to "eventual problems", he usually meant to write "possible problems" or even "problems not likely to happen". I've got a small dictionary of Norwegain-English false friends. But even languages as close as Swedish and Norwegian (we rarely care to translate, except for formal documents) have false friends. If I write to a business partner "Jeg har ikke anledning til å møte deg", I regret that I won't have any opportunity to meet you, the Swede will read it as I do not have any reason to meet you.
Then there are those English vs. English stories... The classic book "Big Business Blunders: Mistakes in International Marketing" tells about this joint project between a British and a US company. The cooperation was rather unsuccessful. For one joint project meeting the management agreed that "these problems be tabeled". Problem was that to one party, "tabeling" a problem meant putting them face up on the table, for everybody to see, to solve the problem. To the other party, "tabeling" meant laying it face down on the table, not bringing it up, keeping it down. One party got crossed because the other party seemed to refuse to take the discussion that they had agreed upon, the other party got crossed because they had agreed to put those problems aside for that meeting, yet the other side kept pushing.
... Today, I think I would have been more fascinated by working professionally with natural languages than with programming languages ....
|
|
|
|
|
One of my absolute very favorites: Gift
In English, it's nice to receive a gift.
In German - not so much[^]
Add Australian English (is it still English?) for some twists as interesting as their fauna.
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|
In Norwegaian, gift is (like in German) something you should not consume. Gift is poison.
And, being married is to be "gift".
Draw whatever conclusion you want.
|
|
|
|
|