Kids who use ChatGPT as a study assistant do worse on tests
A University of Pennsylvania study found that high school students using ChatGPT scored worse on tests, while a specialized AI tutor improved problem-solving but not test scores, highlighting potential learning inhibition.
Read original articleA study conducted by researchers at the University of Pennsylvania examined the impact of ChatGPT on high school students' math performance. The study involved nearly 1,000 Turkish high school students who practiced math problems using three different methods: with ChatGPT, with a specialized AI tutor version of ChatGPT, and without any AI assistance. Results showed that students using ChatGPT performed worse on subsequent tests, scoring 17% lower despite solving 48% more practice problems correctly. In contrast, those using the AI tutor solved 127% more problems correctly but did not achieve better test scores than students who practiced without AI. The researchers concluded that reliance on ChatGPT may inhibit learning, as students often sought direct answers rather than developing problem-solving skills. Additionally, ChatGPT's inaccuracies—correctly answering only half of the math problems—further complicated learning outcomes. The study highlighted a tendency for students to overestimate their learning, with many believing they had benefited from AI assistance when they had not. The findings suggest that while AI can enhance practice efficiency, it may not translate to improved understanding or performance in assessments.
- Students using ChatGPT as a study aid scored worse on tests than those who did not use it.
- A specialized AI tutor version of ChatGPT improved practice problem-solving but did not enhance test scores.
- Overreliance on AI tools may inhibit the development of essential problem-solving skills.
- Students often misperceived their learning outcomes when using AI assistance.
- The study indicates that AI can improve practice efficiency but may not lead to better academic performance.
Related
AI can beat real university students in exams, study suggests
A study from the University of Reading reveals AI outperforms real students in exams. AI-generated answers scored higher, raising concerns about cheating. Researchers urge educators to address AI's impact on assessments.
How Good Is ChatGPT at Coding, Really?
A study in IEEE evaluated ChatGPT's coding performance, showing success rates from 0.66% to 89%. ChatGPT excelled in older tasks but struggled with newer challenges, highlighting strengths and vulnerabilities.
Can ChatGPT do data science?
A study led by Bhavya Chopra at Microsoft, with contributions from Ananya Singha and Sumit Gulwani, explored ChatGPT's challenges in data science tasks. Strategies included prompting techniques and leveraging domain expertise for better interactions.
Why AI is no substitute for human teachers
A study from the Wharton School found high school students using generative AI for math prep perform worse on exams, highlighting the need for guidance and the importance of human teachers.
AI cheating is getting worse. Colleges Still Don't Have a Plan
Colleges are struggling with increased cheating from AI tools like ChatGPT, prompting educators to seek innovative strategies, including curriculum integration and revised assignments, to maintain academic integrity and engagement.
In life, we rarely have the answer in front of us, we have to work that out from the things we know. It’s this struggling that builds a muscle you can then apply to any problem. ChatGPT, I suspect, is akin to looking up the answer. You’re failing to exercise the muscle needed to solve novel (to you), problems.
1. Control, with no LLM assistance at any time.
2. "GPT Base", raw ChatGPT as provided by OpenAI.
3. "GPT Tutor", improved by the researchers to provide hints rather than complete answers and to make fewer mistakes on their specific problems.
On study problem sets ("as a study assistant"), kids with access to either GPT did better than control.
When GPT access was subsequently removed from all participants ("on tests"), the kids who studied with "GPT Base" did worse than control. The kids with "GPT Tutor" were statistically indistinguishable from control.
LLMs, for me, have been tremendously useful in learning new concepts. I frequently feed it my own notes and ask it to correct any misunderstandings, or to expand on things I don’t understand.
I use it like I would an on demand tutor, but I can totally understand how it could be used as a shortcut that wouldn’t be helpful.
In the same way, I can hire a tutor that will help me actually learn, or I can hire a “tutor” that just does the homework for me. I’ve worked as a tutor so I’ve seen people looking for both, and people that don’t want to learn are always going to find a way. People who do want to learn are also going to find a way.
“Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor). However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base).”
Kids who use ChatGPT do actually “significantly” better according to the authors. Now I don’t know if significantly means statistically significant here because I haven’t read the methodology but 127% increase in performance must be something. That said, that’s a clickbaity title if I’ve ever seen one.
Edit: Upon closer reading, the increase in performance is statistically significant. Also “access to GPT“ in this case is having GPT open while solving the problems, not studying with GPT and then solving the problems, which was my first understanding from the clickbaity title. Results are not terribly surprising in that regard.
Then at Uni I'm doing Computer Graphics, which included advanced (for me) math. I was panicked, and initially struggled until one of my good friends who was also studying the same course, and is VERY good at math, was able to answer my vague "I don't get it" questions, or at least guide me to more specific questions.
I think I'm quite a visual learner, I don't think at that time there was a concept of people learning "differently". Luckily my good friend was also a visual learner, along with also being very good at math. It was like someone was able to see how my brain worked and feed me information in a way it could compile. I became quite good at math after that.
You really need to learn how to learn. Its fascinating, but also horrifying when I now consider all the lives that have been negatively impacted because this wasn't understood, and people were led to believe they couldn't do something which maybe then really wanted to be able to do.
If GenAI can help with that, I'm all in.
If you take the measuring tape away from the person who relied on that tool instead of being good at using a meterstick, or perhaps no tools besides their own arm length, they are gonna suddenly not be able to measure, unless they go through the effort of learning to measure without the tape.
You can argue that measuring tape is a crutch preventing people from learning how to properly measure, and has its own limitations, but regardless its still really helpful, especially for people who only need to measure things occassionally, and not super long things.
ChatGPT is a tool. Just like all other tools, like computers, cars, etc., if you take it away, most people cannot perform the function for which they relied on the tool to help them do.
From the (draft!) paper's abstract:
A key remaining question is how generative AI affects learning, namely, how humans acquire new skills as they perform tasks. This kind of skill learning is critical to long-term productivity gains, especially in domains where generative AI is fallible and human experts must check its outputs.
..
Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor).
However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base). That is, access to GPT-4 can harm educational outcomes.
These negative learning effects are largely mitigated by the safeguards included in GPT Tutor.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486Thus, when the test rolls around, nothing is memorised and then they do bad.
It's like memorising phone numbers VS keeping in the contacts app. Before I memorised tons of numbers, but now they're all on the app and I barely recall my own.
That said, all things being equal kids who write notes by hand out-perform kids who type them. Even touch type them. So maybe the old ways are better in this specific brain-knowledge-competency-understanding forming space?
The act of repetition and processing the data ourselves is what leads to a deeper understanding, and asking a chatbot for an answer seems like it would skip the thinking required when learning "the old fashioned way."
Maybe we can learn how to incorporate using chatbots in education, but I suspect there need to be guardrails on when and how they are used so students can get the benefit of doing the work themselves.
Is it me, or is does this directly contradicts the title?
“Those with ChatGPT solved 48 percent more of the practice problems correctly, but they ultimately scored 17 percent worse on a test of the topic that the students were learning.”
So, in the real world, where people can use chatgpt in their jobs, the kids that use it will do better than the kids who don’t.
Maybe a better test is: can you catch chatgpt when it is wrong? Not, can you answer without ChatGPT?
I had the suspicion that this is not aiding in my learning process even though I am able to "solve" more problems. Nice to see this confirmed. Time to stop!
Should have started with that.
A study without independent replication hardly counts as «researchers found», much less one that hadn't even been peer-reviewed yet !
It's like when cars first came out, you ask people to drive cars for a month and they get used to cars. Then you ask them to compete in a horse race and see how fast they can go.
We should evaluate how fast they solve a problem, no matter how.
s/parents/chatgpt
After that, the student should struggle the old fashion way with problems.
I would like to see a study that looks at this approach.
If your human tutors just give you the answers when you ask for them, how do you think it'll ho?
Soulless drivel, endlessly streaming.
And I'm confident that the education system as we know it will be severely damaged because of it.
Even in our own field, I can guarantee you that software developers that "grew up" with these garbage AI assistants will be worse coders than the generation that came before. You will never develop the understanding, the insight, that's needed by chatgpt'ing your way through college and life.
Excellent news for my own market value of course, but I don't hesitate to say that I regret the LLM hype happened, the impact on the world is overwhelmingly negative (not even touching on the catastrophic environmental and financial cost to society).
Let me tldr:
- Study had 3 groups: normal GPT, system prompt to make GPT act as tutor and focus on giving hints, not answers, and no GPT
Group 1 (normal GPT)
- 48% better on practice problems
- 17% worse on test
Group 2 (tutor GPT)
- 127% better on practice problems
- equal test score to control group
GPT errors:
- 50% error rate
- 8% error on arithmetic problems
- step by step instructions we're wrong 42% of time
- GPT tutor was fed answers
- students with GPT and GPT tutor predicted that they did better (so both groups were over confident)
Paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486I'll reply with my opinion to this comment. But many comments are not responding to the article content
Yeah you'll lift much, much more. But is that the point?
Related
AI can beat real university students in exams, study suggests
A study from the University of Reading reveals AI outperforms real students in exams. AI-generated answers scored higher, raising concerns about cheating. Researchers urge educators to address AI's impact on assessments.
How Good Is ChatGPT at Coding, Really?
A study in IEEE evaluated ChatGPT's coding performance, showing success rates from 0.66% to 89%. ChatGPT excelled in older tasks but struggled with newer challenges, highlighting strengths and vulnerabilities.
Can ChatGPT do data science?
A study led by Bhavya Chopra at Microsoft, with contributions from Ananya Singha and Sumit Gulwani, explored ChatGPT's challenges in data science tasks. Strategies included prompting techniques and leveraging domain expertise for better interactions.
Why AI is no substitute for human teachers
A study from the Wharton School found high school students using generative AI for math prep perform worse on exams, highlighting the need for guidance and the importance of human teachers.
AI cheating is getting worse. Colleges Still Don't Have a Plan
Colleges are struggling with increased cheating from AI tools like ChatGPT, prompting educators to seek innovative strategies, including curriculum integration and revised assignments, to maintain academic integrity and engagement.