r/dataisbeautiful • u/DR_C_USP • 2d ago
OC [OC] Who feels the most pressure in college? Data says there are 3 kinds of students
The radar plot highlights distinct respondent profiles across the five stress domains:
- ~60% Cluster 0 (Low–Moderate Responders)
- Consistently low on physiological, emotional, academic, and environmental stress.
- Slightly higher on lifestyle/behavior.
- ~20% Cluster 1 (Academic/Environmental Strain)
- Strongly elevated in academic stress and especially environmental stress.
- Moderate on lifestyle/behavior.
- ~20% Cluster 2 (High Stress Group)
- Very high across physiological and emotional stress.
- Also above average in academics.
- Lower than Cluster 1 on environmental stress.
- n = 843 college students
- Source: u/article {ovi2025protecting,title={Protecting Student Mental Health with a Context-Aware Machine Learning Framework for Stress Monitoring},author={Ovi, Md Sultanul Islam and Hossain, Jamal and Rahi, Md Raihan Alam and Akter, Fatema}, journal={arXiv preprint arXiv:2508.01105}, year={2025}}
- Tool: GPT-5
15
u/rainmouse 2d ago
Yeah it's not clear what many of the stresses mean. Also the source is to a suspended account.
1
u/DR_C_USP 2d ago
Suspended? Maybe due to inactivity. It is my school account. I dusted it off after a couple years of not posting anything.
4
u/themodgepodge 2d ago
Your citation contains “u/article,” which is interpreted by Reddit as referring to a username.
3
u/DR_C_USP 2d ago
I see! Thanks, I copied the citation directly from the source and it had the "u/article" embedded. I'll be more careful.
6
u/Krampus_noXmas4u 2d ago edited 2d ago
Definitely need more explanation on what the measures are and how they were taken. You should not need to go to an originating source for that info and it should be presented with the infographic and its explanation. Also, this was generated by GPT-5, was this output validated for accuracy?
edit: definite want to stress not depending on external links for measure explanations and methods since the source link leads to a suspended reddit account. Also another reason to question how this output was validated. Not saying its wrong, its just lazy which always leads to questions.
2
u/DR_C_USP 2d ago
As much as one can validate a chart made by GPT. If it helps, I only used GPT to create the radar visualization, I used Excel for my initial analysis.
6
u/themodgepodge 2d ago edited 2d ago
PDF link for the source paper, which cites two Kaggle datasets:
Student Stress Factors: A Comprehensive Analysis, high school and university students in Nepal. This appears to be the dataset feeding OP’s chart. Stress and Well-being Data of College Students2
1
u/Galacticsauerkraut 2d ago
meaning?
You could take a sample of people and separate it in 3 clusters of whatever critetia you like, namely:
Theres tall, middle and short Theres rich, middle and poor Theres smart, average and stupid
1
u/InstanceNoodle 2d ago
I guess that could be true. But usually, cluster means a group of closely bunch up data points. Your 3 ideas might have random or close to 0 grouping.
I think there could be correlations if the study were done in poorer country. Tall people get more milk or food when they are young. Which means coming from a rich family. And able to go to school and learn things (smart).
In the United States, there are multiple poor places with poor water drainage, lead leaching in the pipe, and poor air quality that can lead to poor health and lower iq. Free food stamps and free school.lunches do negate the lack of nutrients to some extent.
2
u/DR_C_USP 2d ago
Cluster analysis is a way of letting the data sort people into groups based on how similar their answers are. Instead of us deciding the categories ahead of time, the algorithm finds natural patterns — so students who respond in similar ways end up in the same cluster, and distinct groups emerge where their stress profiles look very different.
3
1
u/CAustin3 2d ago
This is a great experiment on just how far into academia students who can't write an elementary school five-paragraph essay without AI can get.
Cluster 0 is confident. Cluster 2 is anxious. Cluster 1 is invented by the student writing the thesis hoping that defining and playing with the term "environmental stress" will be enough to get into grad school with.
Love the flagrant audacity of citing "GPT-5" as a "tool," but this reads more like 4o, so the students might be genuinely unable to figure out which AI they've been using.
1
u/DR_C_USP 2d ago
The clusters are not invented. I used a k-means clustering algorithm and 3 groups emerged from each student's pattern of answers. The algorithm tries different ways of grouping them so that students in the same group are as similar as possible, and groups are as different from each other as possible. When K-Means looked at all the student responses, it found one cluster of students who were not especially high on physiological or emotional stress (so they didn’t fit Cluster 2, the “high stress everywhere” group), but who consistently rated higher on questions about professors, the working environment, and home/hostel difficulties, hence cluster 1.
32
u/Nillavuh 2d ago
Can you explain more what is meant by "environmental" stress and maybe also "lifestyle / behavior" stress? Is the latter related to wanting to get involved with the social scene on campus and the stresses involved with that?