If you force me to guess, then I'm going to guess. Not only does that give me a 25% chance of getting it right at random, but as others have pointed out, it is very hard to make a multiple choice question that isn't guessable by an astute enough test taker. I think I knew 80 - 85 of those words, but I scored 97, because those questions were very guessable.
Also, reiterating everyone else's comments with respect to the UX needing fewer clicks, and also the definitions not being exact or precise in many cases.
I've seen other systems like this calibrate far more quickly by assigning a sort of score and confidence behind the scenes. Confidence starts out low and increases over time - correct/incorrect answers rapidly adjust score at the beginning, then things settle down.
In practice this means you get a sequence of increasingly uncommon words initially, until you get one wrong, then you drop back to something easier until you start getting things right again, and eventually circle around words at your level.
Also - too many clicks per word. It's low stakes, just let me click the definition once and I'll live if I misclick (or add an undo button).
I suggest skipping the submit button and just showing it's correct when pressing and moving on after a sec or so. Having to click on submit twice really breaks the flow.
Also in all the words I tried I noticed out of the 4 options one is the correct one, another is the opposite of the correct one, and the other 2 are random stuff. You can basically skip any option whose antonym isn't present as well.
My shorter OED contains 163,000 words (compared to the 600,000 words of the longer).
According to this site I know 71,000 words... Let's test that against the OED. I should have about 43% chance if knowing a word picked at random.
In my totally scientific test (ha) I chose 50 words at random from the OED and discovered I knew 29 of them for a score of 58% which is more than two sigma from 43%, this disproving the hypothesis.
I forgot what that was now, but it was a fun experiment.
Core Basics 19/20
Intermediate 17/20
Advanced 19/20
Expert 14/20
Grandmaster 12/20
I guess, it's not too bad for a non-native speaker.
Minor feedback:
1. The correct answer for "Lethargic" is "Affected by lethargy". I think, definitions should not use words that share common root with the defined word, because:
a. it makes guessing too easy
b. it basically becomes a circular definition which is meaningless
2. Options almost always include 1 correct answer, 1 direct opposite and 2 completely random. Once you learn to recognise it, you can easily rule out 2 random options and have a 50/50 guess.
But then below it said "you are a man of few words".
I take it the latter is just because I've only done the test once? But it's mixed messaging on first attempt I think.
Your method of sampling could be improved further, unfortunately at the expense of ease of use. If the dictionary was sorted according to difficulty, then you could use stratified sampling.
I comment on the related aspects here.
If one long versions you choose that, if two, then you choose the one that would be more useful to have a word assigned to it.
You are correct. I tested that hypothesis about a dozen times and it seems that if you always pick the longest you’ll get it right somewhere in the high 70s to mid 80s. For anyone interested in testing for themselves, open the website to the first question then run this in the console (not going to spend time optimising it, it works well enough for the purpose):
let loopCount = 0
const loop = setInterval(() => {
Array.from(document.querySelectorAll("button")).slice(0, 4).reduce((long, curr) => curr.textContent.length > long.textContent.length ? curr : long).click()
setTimeout(() => Array.from(document.querySelectorAll("button")).at(-1).click(), 100)
setTimeout(() => Array.from(document.querySelectorAll("button")).at(-1).click(), 200)
loopCount++
if (loopCount === 100) clearInterval(loop)
}, 500) However, most native speakers have an active vocabulary between 15,000 and 35,000 words.
We must be geniuses, lol."May I compartmentalise? I hate to, but may I? may I?"
"Hold the newsreader's nose squarely, waiter, or friendly milk will countermand my trousers"
"...saying the same weary things time after weary time: I love you. Don't go in there. Get out. You have no right to say that. Stop that. Why should I. That hurt. Help. Marjorie is dead"
https://www.youtube.com/watch?v=3MWpHQQ-wQg (fantastic sketch!)
Fun fact: according to a quick count by AI using web search, the previous sentence contains 21 words of Germanic origin, 2 of Latin origin, 2 of Greek origin and 1 of French origin. Also the etymology of the word Germanic is Latin, while that of the word French is Germanic
A lot of the more common and simpler words are Germanic, as is the grammar (e.g. compound words like cupboard).
At some point the word becomes both. Sourced from its mother language and maybe even still meaning the same thing in both, but no less an English word than any other at this point.
This, and accept that people will have incorrect input and build it into the confidence. Even the smartest person in the world sometimes makes clerical errors, or has the wrong neuron fire at the wrong moment.
They’re also too far away. I’m on a laptop and I have to keep moving the cursor up and down just to confirm. Give each option a letter or number and let me press it to choose the answer¹.
¹ There is (was?) some service for forms which does that and it works quite well. I think it was Typeform, but I just opened the website to check and—of course—it’s now just plastered with mentions of AI so I lost interest in verifying.
I would suggest a bias in this test towards reading. More than a couple are words i know but rarely see in print. But maybe im too much a fan of british TV so i hear many of thier words without seeing them written down.
F.e. Frugal - Economical with money or goods
I don’t think frugal means economical it means rather over the top …
Yeah I don’t know how to define it properly but I don’t need to learn new words if they don’t even teach the right meaning
Ai slop
There were a couple of definitions I did think were a bit off, e.g. 'zenith' and 'nihilism'. And one word where two answers seemed valid but I forget which.
Sometimes it gives one of several possible meanings but that's a valid choice.
In general I think it's a fun quiz - agreed with others though that the word selection brackets aren't ideal. It spends a lot of time on everyday vocabulary, then jumps straight into long words that someone made up one day as a joke.
The words I find most interesting are those that convey some subtle nuance, or describe some very specific thing - tools for old crafts, uncommon but genuinely used adjectives and the like. Very few of those appear.
There were many words I couldn't have explain the meaning of at all, if I wouldn't have had the options, but having the options made it easy. I wouldn't count those correct answers as a part of my vocabulary (even passive), even if I could answer with relative confidence.
The alternatives to choose between appear to be LLM-generated, you can see several patterns ("now" and "forever" appear a lot).
Years ago, I used to play a similar game that you could keep playing and where you levelled up when you had enough words correct in a row, or down for a single mistake. A fun thing about it was that at very high levels, it got easier for me because they mixed in some old English words which were essentially the same as in Dutch, my native language. There was a charity aspect to it as well, I think it was https://freerice.com/ , but they seem to have simplified the game now.
The university of Ghent (Belgium) also used to have an interesting test which rated your proficiency according to average scores at certain education levels. There I got 41.000 (IIRC), which was rated as average for a university-level native English speaker. An update at the bottom of https://languagehat.com/ghent-vocabulary-test/ discusses where that test went and has a few alternatives. Edit: https://www.myvocab.info/en is pretty similar to this test (found in another comment).
I'd prefer an "I don't know" option just for a more honest assessment of how many words I truly know versus how many words I can guess.
Also sometimes two options are the opposites of each others. In this case, one of them is correct.
I feel like you can get close to 70/100 with this heuristics, without actually knowing any words.
Most of the corpuses I've found heavily over-represent newspaper articles and books, obviously. So the frequency ranking is biased towards academic/crime/geopolitics, not spoken english. But even then, it depends what you most commonly speak about!
There's no better way to do it, though. I'm just providing context.
Also many highest difficulty words are actually combinations of multiple smaller words which makes it easier to guess, I got more right in expert/grandmaster than in advanced.
Aside from that, I didn't like that most of the words only had one or at most two definitions that sounded viable.
A lot of these words have either Latin or Greek origins, for most questions you can deduce the correct answer by asking the question: "Which of these would make sense to develop into a separate word through the mostly non-modern history of the language?".
I would enjoy it way more if all four options sounded equally viable, and I couldn't deduce the correct answer without actually being sure about the meaning of the word. I understand that coming up with choices like that for each question is way harder if you actually validate all of them manually.
I got a score of 76000 best estimate with 85 being correct, even though English is not my native language and I'm not that good at it.
Interesting how literally everyone here's performing better than I do. Perhaps that's because I just clicked on the first option whenever I don't know about a word.
(The median English speaker almost certainly knows several thousand words, or word stems to avoid duplication. But the number who know all words in the tail is exceptionally small.)
Otherwise the most common vocab size would be equal to one.
That's always going to be smaller than the set of words for which a person can choose the correct definition out of four options.
Latin isn't really any sort of parent to Old English afaik, even though the Romans ran Britain for a while.
Zenzizenzizenzic for example.
I'm guessing it's testing our susceptibility to machine-generated compliments
A tangent: writing distractors for multiple choice questions is hard. From the exams I know (excluding those whose nature precludes it, such as based on calculation or rote memorization) the only that does this brutally well is LEK (Polish medical graduate exam). It's nigh impossible to vibe guess it at more than random chance for someone outside the field.
Some of the words chosen are rather absurd/inappropriate: breviary (which I got wrong but felt like a vaguely religious word) was characterized as intermediate but I think it's much more obscure and less obvious than that; Hippopotomonstrosesquippedaliophobia was used as a word (I got that wrong as well) - any type of 'phobia' word is really the sort of thing a fourth grader opens up a page in the dictionary and points out, not a word that is used... ever; metamorphosis and kinetic were labeled expert, which I don't agree with (what elementary schooler doesn't learn about the metamorphosis of a caterpillar into a butterfly? what high schooler doesn't learn about kinetic energy?).
Most words were reasonably well defined in a way that most people would understand or recognize. A few words had poor definitions: lethargy ("the state of being lethargic" - obvious); complacent ("smug satisfaction with oneself" - I disagree that complacency is intrinsically smug); magnanimous ("generous toward a rival" - I disagree that a rival must be involved); gauche ("socially awkward" - this is sort of close but the given definition completely misses the idea of being tactless).
They call it scientific and give a hand-wavey formula, but they don't explain how words are stratified in the first place. If stratified sampling is a formally recognized method of doing this, it would be nice to have a link to a real reference. I think I know a lot of words, but I am skeptical of the estimate this app provided (north of 75k).
(context: native English speaker, big reader, huge nerd, perfect SAT score)
I got all 100 correct on the first try without looking anything up! Confusingly, that only resulted in a "SCIENTIFIC ESTIMATE" that I know 85,000/~170,000 words?
Their "How is this calculated" page that appears at the end explains their error:
> According to the Oxford English Dictionary (Second Edition), there are approximately 171,476 words in current use.
> We use Stratified Sampling. Instead of testing random words, we divide the language into 5 distinct difficulty bands based on frequency of use:
> 1. Core Basics ~3,000 words > 2. Intermediate ~7,000 words > 3. Advanced ~10,000 words > 4. Expert ~25,000 words > 5. The Obscure ~40,000+ words
> If you answer 2 out of 3 'Intermediate' questions correctly, we estimate you know roughly 66% of the 7,000 words in that band.
> Total Score = Σ (Accuracy in Band × Band Size)
Their strata add up to 85000, not ~170k, making a perfect score still give a 50%.
They're also using a pretty limited and perhaps non-difficulty-representative subset of the language.
Cute, but wrong on many counts.
The sample of words is also heavily biased towards concepts relating to words, speech, speakers, and/or persuation. They are likely generated by an LLM which is primed on the task of choosing words, and end up choosing words related to "words".
For context, I'm an L2 speaker, linguistic nerd, and I use English mostly in academic/professional settings. I got 75,400 by a combination of the tactics above; in reality it might be closer to 10-15k.
The design is also painfully similar to Duolingo if anyone can spot that.
Like if author used LLM to generate wrong definitions per word instead of actually mixing definitions of words.
Like for me most of more complex words been adjectives with few nouns. And in many cases you can just see 2/4 or 3/4 definitions are not for adjective.
So it's not uncommon to see a native English speaker totaling 90 as 20,20,19,17,14, and a foreigner reaching the same total as 18,18,18,18,18. Strangely enough, the algorithm favors the latter, because it assigns more weight to the higher-end bands.
Is this of any use? I doubt so, but it was fun.
P.S. of course a more reliable clue of nativeness is the use of "its" and "it's" interchangeably, a mistake EFL learners wouldn't do.
Good news for the project is that I think you can easily tweak the LLM to generate better alternatives.
I got 89/100, which extrapolates to 72,700. As a non-native speaker, I'm quite happy with that.
At least I learned a bunch of «faux-amis» in the process.
At least that was my experience as a native Italian speaker. My English vocabulary is good, but not great by any means and by reading books in English I know that there are plenty of words that are not derived from Latin
A word's "difficulty" would be some function of how rare it is. Once you have a reasonable estimate of the user's "skill" you can infer that a user won't know more difficult words. The benefit of this is you're not spending time asking the user about words they probably know.
Of course it's possible at an individual level, difficulty does not monotonically increase as a function of how rare the word is. A person might be very familiar with a domain-specific subset of English. But the "stratified sampling" approach will also have this problem.
There is a similar problem in chess, where players have ratings which really only change on one dimension. So there can theoretically be a mismatch when puzzles are also scored on a single axis, since a "harder" puzzle that contains a motif a player is familiar with will actually be easier for the player.
The very first one was "Unique". I wondered if "the only one of its kind" was still the correct answer, having seen "very unique" used all too often recently. They accept "only one of its kind".
Missed "hegemony" (wasn't sure a hegemony had a leader), "quotidian" (should have known that, seen it before), "ultracrepedarian" (new word to me), "absquatulate" (19th century slang), and "fartlek" (Swedish interval training).
At least I can step away from the laptop now I've got RSI.
>Read the dictionary from A to Z. It's a gripping tale with a terrible plot.
I actually have! I was very bored with the barely-above-"see spot run" books in the classroom at around 8, and we didn't yet have open access to the school library. The dictionary was a better option than all the others I had access to (in class).
Any other dictionary-completionists in here? Regardless of size - I'm fairly sure mine was rather small, though not a pocket-sized one.
For all its shortcomings, this was part of the fun, deducing the likely correct answer when you see a word for the first time.
In case of online quiz you can have a "competition" between distractors:
1. start by having much more distractors than needed and pick randomly
2. for each measure the probability of it getting clicked (clicks/times it's shown)
3. show the most frequently clicked distractors more often
Regardless, this was fun.
Some at Level 4 was definitely a lot more obscure than those.
I had frugal stored as more than just economical.
Thanks for your comments :/
What is?
> I'm guessing it's testing our susceptibility to machine-generated compliments
I fail to see the point. For one, the compliments aren’t particularly good or interesting; for another, I didn’t even read them (I just went back to check after your comment), I simply clicked when seeing green.
Hippopotamus does mean river horse and I was caught out by that (note the o instead of a in ...poto...). I think that word is really a joke - lol - a bit like floccinausilihilipilification, which I wont bother looking up the speling 4.
I had to look up the English word (lumbago), but German has the colorful “Hexenschuss” (witch shot). I suspect most people above a certain age can relate to there being a word for this in most languages.
Yeah. Clocked it from the landing page.
Yes, exactly like this.
I agree that it doesn't seem 'smug', but weirdly both dictionary dot com and Wiktionary give 'smug' as a synonym or part of the definition.
But they also analyze 'smug' as equivalent to self-satisfied or self-complacent, so maybe that's the word whose meaning is not as expected.
(I would think of "smug" less as "self-" anything - it implies a relation, it's more like exulting in a superior situation one has over someone. And 'complacent' is at base being content with one's situation, but often with the negative implication that one should be acting to make things better instead)
Breviary: this was, to me, known and not uncommon. It's widely known to Catholics, but also, if you have an interest in medieval art or books, you'd likely know it too. It was one of the main types of books before the invention of the printing press. Think of an image from an illuminated manuscript, 50% chance it's from one.
Hippopotomonstrosesquippedaliophobia: it's not that you're expected to know the whole word, but they're looking for you to recognize components of it and infer the meaning from that. I knew sesquippedalian (sometimes jokingly used in "long word" contexts) so that was easy: but phobia is also easily identifiable, and hippo, from the latin root, I knew was not as obvious as the animal, but probably something like "large" (clue: the Hippodrome). So you could, even knowing only "phobia" and being able to guess "hippo", have a good basis for your choice.
Complacent and gauche: have heard both these uses, I think that's straightforwardly correct. If this was a dictionary that would, at worst, be the 2nd or 3rd definition. No complaints.
Source: I used to place in spelling bees and could've been a contender but I didn't have the discipline to study the dictionary for hours on the weekends, which is the next level.
It really could do with a summary showing the answers you made and corrections for what you got wrong.
I agree there were too many clicks per word, I took me too long to finish. But I also found it too easy to guess the few words I did not know
95% of Americans.
As it usually happens in this kind of "check your vocabulary" tests in English, being Greek gives you an advantage in higher levels ;-)
But the choice of "advanced words" seems a bit odd. Obscure, isnt that obscure.
Sure there are some speciality words, but most of these words are just the stuff you're gonna hear on radio4 in normal conversation
I attribute most of my success in life to reading early and often. Bartending in college rounded out the social skills (for me) but those two skills have carried me further than I anticipated, coming from a poor background.
Have you found the same to be true?
I did the full 100. It's not even 1/4, with the harder ones when one description is significantly longer than others, it's the correct one. Even outside that 2 choices are usually some object - which I think is never the correct answer
I'd also say the toughness should be mixed up a little. The last 30 or so became a slog
Cool idea though!
Maybe I should consider myself as one :)
So not surprising perhaps that many of the more obscure words end up being french.
say what you like about antidisestablishmentarianism; at least it's an ethos
well the point would be to see how susceptible you are to that. They're figuring out where your cost vs reward tipping point is.
xylo- = wood; -logy = study
Indeed from M-W: "a branch of dendrology dealing with the gross and the minute structure of wood"
Flibbertigibbet appears in some of the Little House on the Prairie (Laura and Mary) books, if I remember right.
And I've also read Gulliver's Travels which is where Brobdingnagian comes from. Brobdingnag was a land of giants. Pretty sure I've seen the word used elsewhere though.
Speaking of things that stick... arachibutyrophobia is the feat of getting peanut putter stuck to the roof of your mouth. (I admit I had to look that one up, as it's not nearly as memorable, though I knew the word existed).
It's hard to disestablish a religion. Too many people believe. In Russia, the Russian Orthodox Church came back after Communism went down. Now Putin uses it to reinforce his rule.
I too can say it and I'm very English...ish. LlanPG is a tourist attraction and a great example of an amateur advertising idea smashing it!
Anyway, if they were running metrics on that they just became useless because I automated responding to it a bunch of times.
I'll remark that "if you have interest in [some particular academic pursuit], you'd likely know it" is a pretty decent description of the sort of word that shows up in "grandmaster" tier.
(I have joked that, living in Japan, my English is getting worse faster than my Japanese is getting better, but breviary might well be a concrete example.)
Except "hippo-" is from Greek and means "horse".
See NGRAMs: https://books.google.com/ngrams/graph?content=Breviary%2CHip...
Well.. Hippos is greek for horse, and Hippopotamus is a "river horse". Same for Hippodrome, a course for horses. And in latin, hypo means small (and not large), as seen in e.g. hypoglycemia.
I think some of the, were flawed - I can't remember what it was now, but one word two of the meanings were kind of appropriate, but I chose the wrong one, and I think there were 2-3 words I didn't know but guessed from the components in the words. At least one I also guessed that way, but got the complete opposite meaning!
I like this kind of test, but for me, the first 2 sections (which I aced) were kind of redundant. Maybe they needed to stratify it more or do it more dynamically, e.g. maybe do half the layer 1 questions, and if you get all them correct, move on to half the layer 2 questions. If you get one wrong, you get the rest of the layer 2 questions, and maybe if you get more than a certain number of those wrong you also have to go back and do the rest of layer 1. If you ace the first half of layer 2 as well as layer 1, maybe you jump straight into layer 3, etc...
There were many words I didn’t know though.
I suppose they evaluate difficulty based on origin of the word. If you already know German or Spanish you may have a head start when learning English, but on a different subset of it.
MARGARETTA: How do you find a word that means Maria?
BERTHE: A flibberti gibbet!
SOPHIA: A willo' the wisp!
MARGARETTA: A clown!
If there wasnt this fancy word for that, used in tests and quizzes it would be a footnote in history.
When another person dies in Russia by being thrown from the window, nobody calls it defenestration. We just call it a tuesday.
It depends: do you get its and it's right? :)
Of course, for a native speaker at least, but for people with English as a second language there are many lower-class words that we never encountered before, because they simply don't occur in books or in online discussions. I got 88 correct out of 100 in this list but I'm almost certain I'd have faired much worse had the list been about niche house or agricultural items.
What counts as "obscure" is highly context dependent.
I can assure you that just about every American that has made it through middle school has been taught about kinetic energy. Let alone high school.
edit: also, native English (well, American) speaker
I got ~1/3 that is very generous estimate even for "recall" case (recognize), and it obviously false for the "generate" case (using in speech) where I guess my vocabulary is likely ~1/90 of all English words.
Perhaps just because it suits my learning style, I find learning is actually easier if I attempt to work something out or guess it, and then am corrected when wrong, because then I have a memory to anchor it on. If I skip that part and just try to learn some facts, very little is retained. One consequence of this is that I prefer science / logic based subjects to things like history or geography (as in places, etc, not the science parts) where it's just a bunch of arbitrary facts that you can't just guess or work out for yourself.
Have they retained that knowledge beyond the test at the end of the semester?
Anecdotal observations would imply that they have indeed been taught it, and indeed have failed to retain the concept.
I have no rigorous data regarding either; but the generally poor outcomes which appear as result of a lack of retention of scientific, math, socio-economic, and anthropological instruction do seem self evident both from within and outside of the US, in headlines and actions, writ large and for all to see.
Is the problem the use of teaching methods which focus on short-term memorization rather than conceptual comprehension? Is it the lack of support for instructors? Is it a lack of focus in the student body? Is it some or all of the above in varying degree? Or something else entirely?
A lot of prestigious and scholarly vocabulary in English has come in through Latin and Greek (at various points in the history of English!), so you can learn that vocabulary or make it more memorable or more transparent either by studying Latin and Greek as languages, or just by studying some of their common morphemes (e.g. there are lists of Latin and Greek roots that may be given to medical or life sciences students to help them learn to recognize the meaning of terminology coined from these languages, even without speaking the languages).
But I think it's actually unrepresentative of the English language as a whole if we're literally thinking about vocabulary size rather than historical prestige of some part of the vocabulary. For example, foreign foods like "nori", "pandan", "dolma", "vichyssoise"[1], or "berbere" are often used as English words and would probably appear in large English dictionaries nowadays. None of that was tested in this quiz. I saw one foreign political term which I guessed at, and one or two German loanwords which I knew (I've also studied German), and almost everything else was Latin or Greek origins!
[1] apparently coined by a French-speaking American based on French roots?
Even if you're an introvert, working for a couple months at Olive Garden when you're 19 helps you to smile and be polite when 80% of the customers are mouth breathing idiots. Turns out they aren't all mouth breathers and those para social skills come into play later during your career.
I highly support kids of all origins working in service for a bit. Ain't a class thing, but is very helpful in getting used to the breadth and depth of people.
I managed a paltry 90/100. Some of those words require a classical education and probably a British one at that. I studied Latin at two posh schools and have O level English Language and Literature (that's two qualis at age 16).
I'm pretty well read and know exactly who Sandi and Stephen are. Ironically Sandi is Danish but notably erudite (that turned up for me) and navigates her way around English with remarkable aplomb.
There are few professions where it's not unusual to have an hour+ conversation about literally any topic, and then potentially do it again the next day with the same person about a different topic. More similar to a therapist than customer service.
In the last batch there were a few words that I was vaguely confident of but a lot more of them seemed like "stunt" words I would never see because every time they'd need defining so why bother.
Also I was assuming it was picking from a huge set, but it seems everybody was shown the same words, so while it's supposedly a "sample" any bias, even if unintended, shows up in the results, if you wanted to be scientific perhaps you'd do this for 1000 words and then sample 100 questions from that for each participant or something.