Hey everyone,
A little history: The last language I seriously studied was Mandarin Chinese. The grammar was easy, but memorizing all the tones was painful. I ended up using data analysis to find patterns to help me make educated guesses. My big regret from that time is that I never published anything that others could learn from.
a month ago, I started learning Dutch. I'm probably 1/2 A1 level now, and the grammar is a whole different world compared to Chinese. The first big "slap in the face" was, of course, de and het.
So, I reverted to my old data science habits to tackle the problem. But this time, I wanted to make sure my effort wasn't just for me alone. I decided to publish my work as a free, interactive tool that everyone can use.
The core idea is this: Stop memorizing 'de' and 'het' one word at a time.
My app helps you see the patterns by grouping words with similar meanings (what data scientists call semantic clusters). The goal is to help you learn articles for entire "families" of words, so you can start making educated guesses instead of relying on pure memorization.
You can check out the app here: https://dutch-data-analysis.streamlit.app/
Since I'm still a beginner myself, I'm sure there are insights and patterns that I haven't seen. I would absolutely love to hear your feedback, suggestions, or any interesting things you discover with the tool.
Let me know what you think!
Dank je wel!
Edit:
There are many other things you can do with this app:
- you can see the word ( noun) length per article.
- You enter a word and then all the closest n number of nouns in terms of meaning, then see their articles.
- You can also see suffixes and prefixes attached to each article.
I have many ideas to add in the future, not only about the articles De and Het. I am also considering using big datasets as long as my computational resources allow me to do so.