How many of the 170k English words do you know?

As others have pointed out, too many clicks per word. I am a sucker for a 'how many words do you know' quiz so I finished anyway. Overall I'm skeptical of the classifications. In broad strokes, the early words are easier and the latter words are more challenging, but the middle is pretty muddied.

Some of the words chosen are rather absurd/inappropriate: breviary (which I got wrong but felt like a vaguely religious word) was characterized as intermediate but I think it's much more obscure and less obvious than that; Hippopotomonstrosesquippedaliophobia was used as a word (I got that wrong as well) - any type of 'phobia' word is really the sort of thing a fourth grader opens up a page in the dictionary and points out, not a word that is used... ever; metamorphosis and kinetic were labeled expert, which I don't agree with (what elementary schooler doesn't learn about the metamorphosis of a caterpillar into a butterfly? what high schooler doesn't learn about kinetic energy?).

Most words were reasonably well defined in a way that most people would understand or recognize. A few words had poor definitions: lethargy ("the state of being lethargic" - obvious); complacent ("smug satisfaction with oneself" - I disagree that complacency is intrinsically smug); magnanimous ("generous toward a rival" - I disagree that a rival must be involved); gauche ("socially awkward" - this is sort of close but the given definition completely misses the idea of being tactless).

They call it scientific and give a hand-wavey formula, but they don't explain how words are stratified in the first place. If stratified sampling is a formally recognized method of doing this, it would be nice to have a link to a real reference. I think I know a lot of words, but I am skeptical of the estimate this app provided (north of 75k).

Interesting concept, but 100 words is really quite a lot to get through... It's tiresome trudging through the easy words at the start, and I never got to see the interesting words before getting bored.

I've seen other systems like this calibrate far more quickly by assigning a sort of score and confidence behind the scenes. Confidence starts out low and increases over time - correct/incorrect answers rapidly adjust score at the beginning, then things settle down.

In practice this means you get a sequence of increasingly uncommon words initially, until you get one wrong, then you drop back to something easier until you start getting things right again, and eventually circle around words at your level.

Also - too many clicks per word. It's low stakes, just let me click the definition once and I'll live if I misclick (or add an undo button).

In addition to everything everyone else has said: their math is off by half (or 100%, depending on how you count), due to a structural error.

(context: native English speaker, big reader, huge nerd, perfect SAT score)

I got all 100 correct on the first try without looking anything up! Confusingly, that only resulted in a "SCIENTIFIC ESTIMATE" that I know 85,000/~170,000 words?

Their "How is this calculated" page that appears at the end explains their error:

> According to the Oxford English Dictionary (Second Edition), there are approximately 171,476 words in current use.

> We use Stratified Sampling. Instead of testing random words, we divide the language into 5 distinct difficulty bands based on frequency of use:

> 1. Core Basics ~3,000 words > 2. Intermediate ~7,000 words > 3. Advanced ~10,000 words > 4. Expert ~25,000 words > 5. The Obscure ~40,000+ words

> If you answer 2 out of 3 'Intermediate' questions correctly, we estimate you know roughly 66% of the 7,000 words in that band.

> Total Score = Σ (Accuracy in Band × Band Size)

Their strata add up to 85000, not ~170k, making a perfect score still give a 50%.

They're also using a pretty limited and perhaps non-difficulty-representative subset of the language.

Cute, but wrong on many counts.

78.000 (-2 advanced, -3 grandmaster), pretty good for a second language; the test's maximum appears to be 85.000.

The alternatives to choose between appear to be LLM-generated, you can see several patterns ("now" and "forever" appear a lot).

Years ago, I used to play a similar game that you could keep playing and where you levelled up when you had enough words correct in a row, or down for a single mistake. A fun thing about it was that at very high levels, it got easier for me because they mixed in some old English words which were essentially the same as in Dutch, my native language. There was a charity aspect to it as well, I think it was https://freerice.com/ , but they seem to have simplified the game now.

The university of Ghent (Belgium) also used to have an interesting test which rated your proficiency according to average scores at certain education levels. There I got 41.000 (IIRC), which was rated as average for a university-level native English speaker. An update at the bottom of https://languagehat.com/ghent-vocabulary-test/ discusses where that test went and has a few alternatives. Edit: https://www.myvocab.info/en is pretty similar to this test (found in another comment).

Pretty fun.

I suggest skipping the submit button and just showing it's correct when pressing and moving on after a sec or so. Having to click on submit twice really breaks the flow.

Also in all the words I tried I noticed out of the 4 options one is the correct one, another is the opposite of the correct one, and the other 2 are random stuff. You can basically skip any option whose antonym isn't present as well.

It should be possible to respond "I don't know". When you really-really don't know, it's unfair to get a 1/4 chance at right anyway, or even better if you use routine multiple-choice tactics.

I got credit for a few that I would have happily just missed.

It is quite easy to cheese the problems: many of them don't look like word definitions ("a sharp pain in the back"), many problem have this "correct answer + opposite meaning + 2 unrelated things" answer structure, and for the second half of the answers, very often the longest answer is the correct one. The wrong options are not well designed here.

The sample of words is also heavily biased towards concepts relating to words, speech, speakers, and/or persuation. They are likely generated by an LLM which is primed on the task of choosing words, and end up choosing words related to "words".

For context, I'm an L2 speaker, linguistic nerd, and I use English mostly in academic/professional settings. I got 75,400 by a combination of the tactics above; in reality it might be closer to 10-15k.

The design is also painfully similar to Duolingo if anyone can spot that.

I got 88 out of 100, but all I learned from that is that I am really good at guessing. For something like 20 of the words I was able to guess by eliminating the options that sounded unlikely and in a few cases just guess from the meaning of parts of the word.

I'd prefer an "I don't know" option just for a more honest assessment of how many words I truly know versus how many words I can guess.

The 171,476 figure from OED is used inaccurately in a way that shows a gross misunderstanding of dictionaries and language. The number 171,476 refers to the number of full entries for words in “current use” as defined in the 20-volume Second Edition of the Oxford English Dictionary (OED). It does not represent words. It also does not include all the OED's variant spellings, inflected forms, phrases or run-ons (sub-entries derived from the main entries). Additionally, the OED is by no means a complete inventory of English. In fact, it's probably millions of words short, especially as it has an incredibly slow update cycle. Source: I am a dictionary editor and lexicographer, use OED daily, and know the people who make it.

I have a copy of the shorter Oxford English Dictionary from 1970 which I inherited. It is two massive volumes and is only shorter in comparison to the full dictionary which is 12 volumes (more in more modern editions).

My shorter OED contains 163,000 words (compared to the 600,000 words of the longer).

According to this site I know 71,000 words... Let's test that against the OED. I should have about 43% chance if knowing a word picked at random.

In my totally scientific test (ha) I chose 50 words at random from the OED and discovered I knew 29 of them for a score of 58% which is more than two sigma from 43%, this disproving the hypothesis.

I forgot what that was now, but it was a fun experiment.

I think it was way too easy to guess corretly based on exluding obviously incorrect choises and then going with vibes.

There were many words I couldn't have explain the meaning of at all, if I wouldn't have had the options, but having the options made it easy. I wouldn't count those correct answers as a part of my vocabulary (even passive), even if I could answer with relative confidence.

It seems like the right answer is usually the longest of the choices, I managed to get a few just by picking the longest. It would also be nice if there was a "I don't know" instead of guessing and skewing the results by getting it right, though maybe thats accounted for

Got 59,800, Performance Breakdown:

Core Basics 19/20

Intermediate 17/20

Advanced 19/20

Expert 14/20

Grandmaster 12/20

I guess, it's not too bad for a non-native speaker.

Minor feedback:

1. The correct answer for "Lethargic" is "Affected by lethargy". I think, definitions should not use words that share common root with the defined word, because:

a. it makes guessing too easy

b. it basically becomes a circular definition which is meaningless

2. Options almost always include 1 correct answer, 1 direct opposite and 2 completely random. Once you learn to recognise it, you can easily rule out 2 random options and have a 50/50 guess.

Should use an ELO rating to find your level faster. Slogging through 100 basics is pointless.

Not that I want to cheat in such a game, but for many words everything but correct definition is shorter or follow some "dumb rpg text" template.

Like if author used LLM to generate wrong definitions per word instead of actually mixing definitions of words.

Like for me most of more complex words been adjectives with few nouns. And in many cases you can just see 2/4 or 3/4 definitions are not for adjective.

I like it a lot, but unfortunately you can cheat a bit: there are always two opposite answers and two unrelated ones. The correct answer is (almost?) always one of the opposites.

Picking max(len(answer)) is the right choice almost every time at the higher level..

Usually the longest answer is the correct one.

Also sometimes two options are the opposites of each others. In this case, one of them is correct.

I feel like you can get close to 70/100 with this heuristics, without actually knowing any words.

I am building in the language learning sector, and this test is almost certainly not accurate (depending on what you want to measure). It's fun and cool though. But basically this is all based on a frequency list, which itself depends on the corpus. I have not been able to find a good corpus of English which is representative of modern spoken English. Spoken english depends on your age range and subculture and and changes every few years. Example: https://observablehq.com/@yurivish/words

Most of the corpuses I've found heavily over-represent newspaper articles and books, obviously. So the frequency ranking is biased towards academic/crime/geopolitics, not spoken english. But even then, it depends what you most commonly speak about!

There's no better way to do it, though. I'm just providing context.

Interesting choice of words I'd say: as a French person this test is pretty much a test about “how close is the English word to the original French meaning” as the test was almost devoid of obscure words of Germanic origin.

At least I learned a bunch of «faux-amis» in the process.

If the goal is to actually calculate how many words we know, then you should include an "I don't know" option. Sure, some people will choose to guess to inflate their score, but some of us will be honest because we legitimately want to know our scores.

If you force me to guess, then I'm going to guess. Not only does that give me a 25% chance of getting it right at random, but as others have pointed out, it is very hard to make a multiple choice question that isn't guessable by an astute enough test taker. I think I knew 80 - 85 of those words, but I scored 97, because those questions were very guessable.

Also, reiterating everyone else's comments with respect to the UX needing fewer clicks, and also the definitions not being exact or precise in many cases.

Feature request: fewer clicks. It should be one click per question

At first I noticed that for many questions two or three of the answers are obviously wrong. So in many cases the correct answer can be guesses easily. But then I noticed that in 90% of the cases the correct answer is the longest of the four. This makes guessing even easier. The whole thing has a vibe-slopped feel to it.

That was fun. Bit confused by the result because it says I was "wow are you stephen fry?" Which I assume meant I did decent. (72K).

But then below it said "you are a man of few words".

I take it the latter is just because I've only done the test once? But it's mixed messaging on first attempt I think.

Suggestion: Add an "I don't know" button. If I don't know a word, I can admit it - but if I have to guess, then I have a 1/4 chance of getting incorrect credit.

I got 96/100 with minimal guessing. Being a native speaker of a Romance language is a huge advantage here; words like “Quotidian” and “Defenestrate” might be exotic in English, but are almost trivial for an Italian.

The harder words are trivia questions an educated English native could get. What I mean by this is they're all words that you'd have a chance of knowing for a reason. Things like defenestrate, antidisestablishmentarianism, hippopotomonstrosesquipedaliophobia. I know these words, but these are not words I know because I've ever had cause to use them. Words can get way harder than this and still be actually used, and not strictly only in a scientific sense. I'm thinking things like "Ginnel" (narrow passage between houses) or "Vamp" (a part of a shoe) or "Moraines" (hilly landscape formed by glaciers) or "Lea" (land used to pasture animals)

I found a big problem with this - I noticed that the longest answer is very often the correct one, which kinda ruined the game. Even though I didn't want it to, it started affecting my decision-making. Luckily, I only noticed this around question 85, though those are really the tricky ones.

Good news for the project is that I think you can easily tweak the LLM to generate better alternatives.

I got 89/100, which extrapolates to 72,700. As a non-native speaker, I'm quite happy with that.

Reading through the comments, I've noticed you can tell the native speakers by their scores in the word categories. A native speaker will score 20/20 in the first two bands and progressively less in the following ones. For those who have learned English as a foreign language, the scores are more evenly distributed.

So it's not uncommon to see a native English speaker totaling 90 as 20,20,19,17,14, and a foreigner reaching the same total as 18,18,18,18,18. Strangely enough, the algorithm favors the latter, because it assigns more weight to the higher-end bands.

Is this of any use? I doubt so, but it was fun.

P.S. of course a more reliable clue of nativeness is the use of "its" and "it's" interchangeably, a mistake EFL learners wouldn't do.

A common pattern is the word's true definition and its opposite, plus two mostly unrelated meanings. So, when in doubt, you can improve your changes by picking one of the opposing pair. That's a bit of short-coming.

"It's a dead language!" they said, "It's a waste of time!" they said, "It's not like you can talk to dead Romans." they said. WHO IS LAUGHING NOW!?

Nice one, what I noticed is that out of 4 options 1 wrong is just something looking similar in letters, and 2 options are opposite meaning of each other - so actually the choice is 1 out of 2, not 4.

Also many highest difficulty words are actually combinations of multiple smaller words which makes it easier to guess, I got more right in expert/grandmaster than in advanced.

Major flaw in the quiz: you can do great by just picking the longest definition.

A much better test, which dynamically adjusts difficulty level: https://www.myvocab.info/en

These should maybe be checked through. Many are the second or third definitions, and some even reference the word in the definition e.g Lethargic: exhibiting lethargy

An alternative algorithm which would probably converge faster than 100 questions would be something like Elo or Glicko 2.

A word's "difficulty" would be some function of how rare it is. Once you have a reasonable estimate of the user's "skill" you can infer that a user won't know more difficult words. The benefit of this is you're not spending time asking the user about words they probably know.

Of course it's possible at an individual level, difficulty does not monotonically increase as a function of how rare the word is. A person might be very familiar with a domain-specific subset of English. But the "stratified sampling" approach will also have this problem.

There is a similar problem in chess, where players have ratings which really only change on one dimension. So there can theoretically be a mismatch when puzzles are also scored on a single axis, since a "harder" puzzle that contains a motif a player is familiar with will actually be easier for the player.

I got 35000, 18/13/9/9/6. Not my first language.

Interesting how literally everyone here's performing better than I do. Perhaps that's because I just clicked on the first option whenever I don't know about a word.

Cute, but for strange words clicking the longest explanation turned out to be akmost always rhe correct one :)

It misses a "I don't know" button. So it has a 20% false positive by guessing bias built in, right?

The option with more words appears to be the correct answer for each question.

I could actually get almost all of the last third correctly by choosing the option that's the longest, has a semicolon, or a coma.

Aside from that, I didn't like that most of the words only had one or at most two definitions that sounded viable.

A lot of these words have either Latin or Greek origins, for most questions you can deduce the correct answer by asking the question: "Which of these would make sense to develop into a separate word through the mostly non-modern history of the language?".

I would enjoy it way more if all four options sounded equally viable, and I couldn't deduce the correct answer without actually being sure about the meaning of the word. I understand that coming up with choices like that for each question is way harder if you actually validate all of them manually.

I got a score of 76000 best estimate with 85 being correct, even though English is not my native language and I'm not that good at it.

Nice! Some feedback: The score it shows doesn't really mean anything to me. I think it would be more interesting for the user to know how they rank (perhaps in percentile terms) relative to the overall english-speaking population and/or relative to other users on the site

This app is a great example of what AI does to your brain. No one making their own choices in the app design would make each question need three clicks.

Would other people define "complacent" as "Smug satisfaction with oneself"? I'm not so sure.

Regardless, this was fun.

It's made with AI and I don't know to what extent. That's enough to have no trust in the results. As a non-native speaker I find those words weird. Some "core words" I have no idea about, but many of the expert ones are easy. So yeah, at least I hope the author had fun vibe-coding it.

78,500.

The very first one was "Unique". I wondered if "the only one of its kind" was still the correct answer, having seen "very unique" used all too often recently. They accept "only one of its kind".

Missed "hegemony" (wasn't sure a hegemony had a leader), "quotidian" (should have known that, seen it before), "ultracrepedarian" (new word to me), "absquatulate" (19th century slang), and "fartlek" (Swedish interval training).

Having the name of a former Indian state doesn't seem to be cricket.

At least I can step away from the laptop now I've got RSI.

It's hilarious that most of these words are French

>Required Reading

>Read the dictionary from A to Z. It's a gripping tale with a terrible plot.

I actually have! I was very bored with the barely-above-"see spot run" books in the classroom at around 8, and we didn't yet have open access to the school library. The dictionary was a better option than all the others I had access to (in class).

Any other dictionary-completionists in here? Regardless of size - I'm fairly sure mine was rather small, though not a pocket-sized one.

I think native speakers of Latin derived languages have an advantage given the proposed words in my run. The list was overly biased that way. In fact, many of the advance and grandmaster levels words are basically that. Latin derived words.

At least that was my experience as a native Italian speaker. My English vocabulary is good, but not great by any means and by reading books in English I know that there are plenty of words that are not derived from Latin

Also, reiterating everyone else's comments with respect to the UX needing fewer clicks, and also the definitions not being exact or precise in many cases.

These should maybe be checked through. Many are the second or third definitions, and some even reference the word in the definition e.g Lethargic: exhibiting lethargy

Also - too many clicks per word. It's low stakes, just let me click the definition once and I'll live if I misclick (or add an undo button).

> Also - too many clicks per word. It's low stakes, just let me click the definition once and I'll live if I misclick.

This, and accept that people will have incorrect input and build it into the confidence. Even the smartest person in the world sometimes makes clerical errors, or has the wrong neuron fire at the wrong moment.

+1 to all these points especially the first one. I dropped off after about 10 words and didn't have a clear path to move to the next level.

It also doesn't get hard enough. Also way too many of the words are just words about long words, or the tendency to be verbose.

Plus a scroll on mobile because the submit button is below the fold, though it seems to stay in the right place after the first scroll.

> Also - too many clicks per word.

They’re also too far away. I’m on a laptop and I have to keep moving the cursor up and down just to confirm. Give each option a letter or number and let me press it to choose the answer¹.

¹ There is (was?) some service for forms which does that and it works quite well. I think it was Typeform, but I just opened the website to check and—of course—it’s now just plastered with mentions of AI so I lost interest in verifying.

100 is too many? Thats two or three minutes at most.

I would suggest a bias in this test towards reading. More than a couple are words i know but rarely see in print. But maybe im too much a fan of british TV so i hear many of thier words without seeing them written down.

yeah, it should just be click->next;

I got tired after 8 words, looked at how many I'm suppose to know and gave up.

It'd be improved with statistical analysis; just progressively get harder and try to guess. If you wanted to gameify, you could update the stats after each answer.

Also the explanations are too broad.

F.e. Frugal - Economical with money or goods

I don’t think frugal means economical it means rather over the top …

Yeah I don’t know how to define it properly but I don’t need to learn new words if they don’t even teach the right meaning

Ai slop

Pretty fun.

I suggest skipping the submit button and just showing it's correct when pressing and moving on after a sec or so. Having to click on submit twice really breaks the flow.

It'd also be a lot less awkward to go through 100 words if it had keyboard shortcuts (1-4 for the words, enter to submit) and if they fixed the layout shift jank

It estimated 74k words for me, but I feel this might be inflated; much of the time when I didn't know the answer - I could vibe guess it just as you did it. The distractor answers weren't convincing enough. For starters, when an answer was based on deconstructing the word into common English words, that ruled it out. After all, if it was, then it wouldn't have been obscure.

A tangent: writing distractors for multiple choice questions is hard. From the exams I know (excluding those whose nature precludes it, such as based on calculation or rote memorization) the only that does this brutally well is LEK (Polish medical graduate exam). It's nigh impossible to vibe guess it at more than random chance for someone outside the field.

> I suggest skipping the submit button and just showing it's correct when pressing and moving on after a sec or so.

Having an answer counted as incorrect, just because I've accidentally touched the screen of the phone? I would absolutely hate that.

My shorter OED contains 163,000 words (compared to the 600,000 words of the longer).

According to this site I know 71,000 words... Let's test that against the OED. I should have about 43% chance if knowing a word picked at random.

In my totally scientific test (ha) I chose 50 words at random from the OED and discovered I knew 29 of them for a score of 58% which is more than two sigma from 43%, this disproving the hypothesis.

I forgot what that was now, but it was a fun experiment.

I also got something around 70-80k with 95/100 correct words (I don't know or use most of these words, but the later sections have a lot of words with Greek or Latin origin, which made them easy to guess). One of my wrong words was a misclick in the first section, which I think dragged down the estimate quite a lot. You may have done something similar. I assume they use a simple formula where early misses cost you a lot and late misses cost you very little.

can't assume gaussian underlying distribution of the word-knowing, it's known zipfian. so you can't be doing anovas or anything of that nature because if you look up zipfian distribution's variance, you get Nature and Reality giving you the middle finger

Neat way to validate.

Your method of sampling could be improved further, unfortunately at the expense of ease of use. If the dictionary was sorted according to difficulty, then you could use stratified sampling.

I comment on the related aspects here.

https://news.ycombinator.com/item?id=48599769

These were likely all AI generated, or at least the alternatives were. I made an app a while ago as well, and afterwards realized AI often wanted to make a very covering answer for the correct one, making it often longer than the others, thus defeating the idea of the quiz in the process.

Usually there were two answers that sounded like the word If read by someone unfamiliar, those were short, then either one or two long versions.

If one long versions you choose that, if two, then you choose the one that would be more useful to have a word assigned to it.

> It seems like the right answer is usually the longest of the choices

You are correct. I tested that hypothesis about a dozen times and it seems that if you always pick the longest you’ll get it right somewhere in the high 70s to mid 80s. For anyone interested in testing for themselves, open the website to the first question then run this in the console (not going to spend time optimising it, it works well enough for the purpose):

  let loopCount = 0

  const loop = setInterval(() => {
    Array.from(document.querySelectorAll("button")).slice(0, 4).reduce((long, curr) => curr.textContent.length > long.textContent.length ? curr : long).click()
    setTimeout(() => Array.from(document.querySelectorAll("button")).at(-1).click(), 100)
    setTimeout(() => Array.from(document.querySelectorAll("button")).at(-1).click(), 200)

    loopCount++
    if (loopCount === 100) clearInterval(loop)
  }, 500)

Also surprisingly mostly the forst or last option (might be bias)

Hahahhaha i got 62k points by just choosing the longest definitions. Great observation!

Got 59,800, Performance Breakdown:

Core Basics 19/20

Intermediate 17/20

Advanced 19/20

Expert 14/20

Grandmaster 12/20

I guess, it's not too bad for a non-native speaker.

Minor feedback:

1. The correct answer for "Lethargic" is "Affected by lethargy". I think, definitions should not use words that share common root with the defined word, because:

a. it makes guessing too easy

b. it basically becomes a circular definition which is meaningless

2. Options almost always include 1 correct answer, 1 direct opposite and 2 completely random. Once you learn to recognise it, you can easily rule out 2 random options and have a 50/50 guess.

I also felt the definition of lethargic was kind of silly, especially since I had already gotten lethargy as a word in tier 1.

Feature request: fewer clicks. It should be one click per question

another feature request: add a skip or "don't know" option. if i truly don't know a word then a lucky guess would inflate my score.

I'd suggest a "toast" would suffice for the correct answers. Proceed to the next question when correct, with a "next" button when incorrect.

Keyboard shortcuts would be nice as well. When I saw it was 100 questions I bailed.

That was fun. Bit confused by the result because it says I was "wow are you stephen fry?" Which I assume meant I did decent. (72K).

But then below it said "you are a man of few words".

I take it the latter is just because I've only done the test once? But it's mixed messaging on first attempt I think.

Maybe "few words" means your larger vocabulary lets you use a single word to represent a concept that someone else would need several words to say. But the conversation ends up longer when the other person asks you to define the obscure word you just used.

Combined with the factoid it features under "how is this calculated":

    However, most native speakers have an active vocabulary between 15,000 and 35,000 words.

We must be geniuses, lol.

> stephen fry

"May I compartmentalise? I hate to, but may I? may I?"

"Hold the newsreader's nose squarely, waiter, or friendly milk will countermand my trousers"

"...saying the same weary things time after weary time: I love you. Don't go in there. Get out. You have no right to say that. Stop that. Why should I. That hurt. Help. Marjorie is dead"

https://www.youtube.com/watch?v=3MWpHQQ-wQg (fantastic sketch!)

It's hilarious that most of these words are French

English has this weird dichotomy where most of the words in a typical sentence are Germanic, while most of the words in the dictionary are French.

Fun fact: according to a quick count by AI using web search, the previous sentence contains 21 words of Germanic origin, 2 of Latin origin, 2 of Greek origin and 1 of French origin. Also the etymology of the word Germanic is Latin, while that of the word French is Germanic

Norman French due to the Norman invasion of 1066 resulting in Old English evolving into Middle English. You can see that in the words for animals vs meats (cow and boef/beef, sheep and mutton, etc.) where the Germanic people raised the sheep and the Norman aristocracy ate them.

A lot of the more common and simpler words are Germanic, as is the grammar (e.g. compound words like cupboard).

Depends is bratwurst a German word or an English one? You will hard pressed to find an American that doesn’t know thr word and what it means. You can buy them at just about any grocery store and they are a staple of many restaurants.

At some point the word becomes both. Sourced from its mother language and maybe even still meaning the same thing in both, but no less an English word than any other at this point.

They are not. Quite a few have Latin roots and the like that corresponding French words share.

French english speakers usually have a quite good vocabulary. Getting to the point of speaking english is a milestone that's quite difficult for french speakers though.

English is the PHP of human languages.

English also has a ridiculously high fraction of Latin too.

+1 to all these points especially the first one. I dropped off after about 10 words and didn't have a clear path to move to the next level.

yeah, it should just be click->next;

I got tired after 8 words, looked at how many I'm suppose to know and gave up.

It'd be improved with statistical analysis; just progressively get harder and try to guess. If you wanted to gameify, you could update the stats after each answer.

I think you mean it's lognormal, at least if we're discussing native English speakers or comparing those with similar amounts of exposure to the language.

(The median English speaker almost certainly knows several thousand words, or word stems to avoid duplication. But the number who know all words in the tail is exceptionally small.)

No way is vocab size zipfian. Word counts from a corpus follow zipf's law, but not vocab sizes themselves.

Otherwise the most common vocab size would be equal to one.

Neat way to validate.

Your method of sampling could be improved further, unfortunately at the expense of ease of use. If the dictionary was sorted according to difficulty, then you could use stratified sampling.

I comment on the related aspects here.

https://news.ycombinator.com/item?id=48599769

Yeah this is AI slop I don't like..

Usually there were two answers that sounded like the word If read by someone unfamiliar, those were short, then either one or two long versions.

If one long versions you choose that, if two, then you choose the one that would be more useful to have a word assigned to it.

> It seems like the right answer is usually the longest of the choices

  let loopCount = 0

  const loop = setInterval(() => {
    Array.from(document.querySelectorAll("button")).slice(0, 4).reduce((long, curr) => curr.textContent.length > long.textContent.length ? curr : long).click()
    setTimeout(() => Array.from(document.querySelectorAll("button")).at(-1).click(), 100)
    setTimeout(() => Array.from(document.querySelectorAll("button")).at(-1).click(), 200)

    loopCount++
    if (loopCount === 100) clearInterval(loop)
  }, 500)

Also surprisingly mostly the forst or last option (might be bias)

Hahahhaha i got 62k points by just choosing the longest definitions. Great observation!

I also felt the definition of lethargic was kind of silly, especially since I had already gotten lethargy as a word in tier 1.

another feature request: add a skip or "don't know" option. if i truly don't know a word then a lucky guess would inflate my score.

I'd suggest a "toast" would suffice for the correct answers. Proceed to the next question when correct, with a "next" button when incorrect.

Keyboard shortcuts would be nice as well. When I saw it was 100 questions I bailed.

Combined with the factoid it features under "how is this calculated":

    However, most native speakers have an active vocabulary between 15,000 and 35,000 words.

We must be geniuses, lol.

That tracks. Active vocabulary means the set of words that someone knows well enough to actually use in their speech or writing.

That's always going to be smaller than the set of words for which a person can choose the correct definition out of four options.

For sure there is a bit of selection bias with hackernews users. Not saying we are all geniuses, but I strongly believe we are, at least, more educated than your average Joe

There are words that I know from this quiz that I would never use in real life or in my writings. I’m not sure why. That’s the active vocabulary distinction.

You are almost always going to find people with above average reading and writing skills on an online forum - especially one with such "curated" audience and spartan UI.

> stephen fry

"May I compartmentalise? I hate to, but may I? may I?"

"Hold the newsreader's nose squarely, waiter, or friendly milk will countermand my trousers"

"...saying the same weary things time after weary time: I love you. Don't go in there. Get out. You have no right to say that. Stop that. Why should I. That hurt. Help. Marjorie is dead"

https://www.youtube.com/watch?v=3MWpHQQ-wQg (fantastic sketch!)

English has this weird dichotomy where most of the words in a typical sentence are Germanic, while most of the words in the dictionary are French.

Yes, English is a post-Hastings collision between Norman French and Anglo Saxon.

A lot of the more common and simpler words are Germanic, as is the grammar (e.g. compound words like cupboard).

At some point the word becomes both. Sourced from its mother language and maybe even still meaning the same thing in both, but no less an English word than any other at this point.

Bratwurst is still a German word. It doesn't become English just because it's used by native English speakers. If you start to tweak it a bit, it could become an English word. Like "fish" vs. "Fisch" in German. Or "good" vs. "gut" in German.

It also had "weltschmerz" in the list, but I think I have only ever heard "ennui" used in English. They are both foreign words, but I would not have thought of weltschmerz as a loan word. Then again, maybe I am not reading the right texts.

They are not. Quite a few have Latin roots and the like that corresponding French words share.

Approximately 0.0% of those came into English through Latin, while around 100% came through Norman French.

French english speakers usually have a quite good vocabulary. Getting to the point of speaking english is a milestone that's quite difficult for french speakers though.

English is the PHP of human languages.

I'm not sure PHP deserved that...

English also has a ridiculously high fraction of Latin too.

Not from Latin but through French - the direct use of Latin in English is generally restricted to technical jargon and legal terms (that English often also share with the French.)

Latin isn't really any sort of parent to Old English afaik, even though the Romans ran Britain for a while.

> Also - too many clicks per word. It's low stakes, just let me click the definition once and I'll live if I misclick.

Moly holy the clicking is too much 3 clicks that could be one :O

It also doesn't get hard enough. Also way too many of the words are just words about long words, or the tendency to be verbose.

Level 5 grandmaster was hardcore!

It does get hard enough but only in the very last fraction.

Zenzizenzizenzic for example.

It gets impossible. Yarborough is apparently not a town in England. I guess technically it's a village but come on...

> It also doesn't get hard enough

Oh come on! Like you really knew what "Hippopotomonstrosesquippedaliophobia" is?

Lol. Yeah. Non native here but gave up at about 50 words. Too many words, too easy. And my English SUCKS

> Also - too many clicks per word.

They’re also too far away. I’m on a laptop and I have to keep moving the cursor up and down just to confirm. Give each option a letter or number and let me press it to choose the answer¹.

it's intentional. therefore testing vocab isn't the point.

I'm guessing it's testing our susceptibility to machine-generated compliments

Plus a scroll on mobile because the submit button is below the fold, though it seems to stay in the right place after the first scroll.

Vibe coders don't know 'bout my dvh.

100 is too many? Thats two or three minutes at most.

Did you actually do 100 words? It wasn't two or three minutes. With good UX, sure. But I wasn't getting through 1 word per second.

It'd also be a lot less awkward to go through 100 words if it had keyboard shortcuts (1-4 for the words, enter to submit) and if they fixed the layout shift jank

wouldn't even let me tab to sumbit, you had to click, tab through each following option, then to submit, but then you had to tab again to confirm the submission!

Also the explanations are too broad.

F.e. Frugal - Economical with money or goods

I don’t think frugal means economical it means rather over the top …

Yeah I don’t know how to define it properly but I don’t need to learn new words if they don’t even teach the right meaning

Ai slop

That seems a pretty good definition of 'frugal' to me. To be excessively frugal would be miserly, tight-fisted or whatever.

There were a couple of definitions I did think were a bit off, e.g. 'zenith' and 'nihilism'. And one word where two answers seemed valid but I forget which.

Sometimes it gives one of several possible meanings but that's a valid choice.

In general I think it's a fun quiz - agreed with others though that the word selection brackets aren't ideal. It spends a lot of time on everyday vocabulary, then jumps straight into long words that someone made up one day as a joke.

The words I find most interesting are those that convey some subtle nuance, or describe some very specific thing - tools for old crafts, uncommon but genuinely used adjectives and the like. Very few of those appear.

"Frugal" most definitely does not mean "rather over the top" unless that is some new slang meaning I've never heard of.

You can look it up in a dictionary? See, e.g., https://www.oed.com/search/dictionary/?scope=Entries&q=fruga...

That definition hinges on their definition for “economical” - adding a qualifier like “excessively economical” would’ve been good I think.

Seems like you don't know what frugal means at all!