September 5th, 2024

Kids who use ChatGPT as a study assistant do worse on tests

A University of Pennsylvania study found that high school students using ChatGPT scored worse on tests, while a specialized AI tutor improved problem-solving but not test scores, highlighting potential learning inhibition.

Read original article

Kids who use ChatGPT as a study assistant do worse on tests

A study conducted by researchers at the University of Pennsylvania examined the impact of ChatGPT on high school students' math performance. The study involved nearly 1,000 Turkish high school students who practiced math problems using three different methods: with ChatGPT, with a specialized AI tutor version of ChatGPT, and without any AI assistance. Results showed that students using ChatGPT performed worse on subsequent tests, scoring 17% lower despite solving 48% more practice problems correctly. In contrast, those using the AI tutor solved 127% more problems correctly but did not achieve better test scores than students who practiced without AI. The researchers concluded that reliance on ChatGPT may inhibit learning, as students often sought direct answers rather than developing problem-solving skills. Additionally, ChatGPT's inaccuracies—correctly answering only half of the math problems—further complicated learning outcomes. The study highlighted a tendency for students to overestimate their learning, with many believing they had benefited from AI assistance when they had not. The findings suggest that while AI can enhance practice efficiency, it may not translate to improved understanding or performance in assessments.

- Students using ChatGPT as a study aid scored worse on tests than those who did not use it.

- A specialized AI tutor version of ChatGPT improved practice problem-solving but did not enhance test scores.

- Overreliance on AI tools may inhibit the development of essential problem-solving skills.

- Students often misperceived their learning outcomes when using AI assistance.

- The study indicates that AI can improve practice efficiency but may not lead to better academic performance.

AI can beat real university students in exams, study suggests

A study from the University of Reading reveals AI outperforms real students in exams. AI-generated answers scored higher, raising concerns about cheating. Researchers urge educators to address AI's impact on assessments.

How Good Is ChatGPT at Coding, Really?

A study in IEEE evaluated ChatGPT's coding performance, showing success rates from 0.66% to 89%. ChatGPT excelled in older tasks but struggled with newer challenges, highlighting strengths and vulnerabilities.

Can ChatGPT do data science?

A study led by Bhavya Chopra at Microsoft, with contributions from Ananya Singha and Sumit Gulwani, explored ChatGPT's challenges in data science tasks. Strategies included prompting techniques and leveraging domain expertise for better interactions.

Why AI is no substitute for human teachers

A study from the Wharton School found high school students using generative AI for math prep perform worse on exams, highlighting the need for guidance and the importance of human teachers.

AI cheating is getting worse. Colleges Still Don't Have a Plan

Colleges are struggling with increased cheating from AI tools like ChatGPT, prompting educators to seek innovative strategies, including curriculum integration and revised assignments, to maintain academic integrity and engagement.

41 comments

By @lll-o-lll - 8 months

When I was young, and learning math, my father always forbade me from looking at the answer in the back of the textbook. “You don’t work backwards from the answer!”, and I think this is right.

In life, we rarely have the answer in front of us, we have to work that out from the things we know. It’s this struggling that builds a muscle you can then apply to any problem. ChatGPT, I suspect, is akin to looking up the answer. You’re failing to exercise the muscle needed to solve novel (to you), problems.

By @tripletao - 8 months

These comments are filled with misunderstandings of the result. There were three groups of kids:

1. Control, with no LLM assistance at any time.

2. "GPT Base", raw ChatGPT as provided by OpenAI.

3. "GPT Tutor", improved by the researchers to provide hints rather than complete answers and to make fewer mistakes on their specific problems.

On study problem sets ("as a study assistant"), kids with access to either GPT did better than control.

When GPT access was subsequently removed from all participants ("on tests"), the kids who studied with "GPT Base" did worse than control. The kids with "GPT Tutor" were statistically indistinguishable from control.

By @dghlsakjg - 8 months

Used incorrectly. Yes.

LLMs, for me, have been tremendously useful in learning new concepts. I frequently feed it my own notes and ask it to correct any misunderstandings, or to expand on things I don’t understand.

I use it like I would an on demand tutor, but I can totally understand how it could be used as a shortcut that wouldn’t be helpful.

In the same way, I can hire a tutor that will help me actually learn, or I can hire a “tutor” that just does the homework for me. I’ve worked as a tutor so I’ve seen people looking for both, and people that don’t want to learn are always going to find a way. People who do want to learn are also going to find a way.

By @madhatter999 - 8 months

From the abstract:

“Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor). However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base).”

Kids who use ChatGPT do actually “significantly” better according to the authors. Now I don’t know if significantly means statistically significant here because I haven’t read the methodology but 127% increase in performance must be something. That said, that’s a clickbaity title if I’ve ever seen one.

Edit: Upon closer reading, the increase in performance is statistically significant. Also “access to GPT“ in this case is having GPT open while solving the problems, not studying with GPT and then solving the problems, which was my first understanding from the clickbaity title. Results are not terribly surprising in that regard.

By @hi_hi - 8 months

Story time. I always struggled with math as a kid. School to high school, then didn't touch it much until Uni. Teachers typically couldn't explain things in a way I "got it" in a school setting. I had some success with a private tutor to get me over the line in high school.

Then at Uni I'm doing Computer Graphics, which included advanced (for me) math. I was panicked, and initially struggled until one of my good friends who was also studying the same course, and is VERY good at math, was able to answer my vague "I don't get it" questions, or at least guide me to more specific questions.

I think I'm quite a visual learner, I don't think at that time there was a concept of people learning "differently". Luckily my good friend was also a visual learner, along with also being very good at math. It was like someone was able to see how my brain worked and feed me information in a way it could compile. I became quite good at math after that.

You really need to learn how to learn. Its fascinating, but also horrifying when I now consider all the lives that have been negatively impacted because this wasn't understood, and people were led to believe they couldn't do something which maybe then really wanted to be able to do.

If GenAI can help with that, I'm all in.

By @eks391 - 8 months

If I lived before the tape measure was invented, and rely on carefully placing my metersticks to measure things, I can get really good at measuring without the need for a measuring tape. After all, a measuring tape is just a few flexible metersticks anyways, so if you need to measure something longer than the full length of the tape, you are screwed.

If you take the measuring tape away from the person who relied on that tool instead of being good at using a meterstick, or perhaps no tools besides their own arm length, they are gonna suddenly not be able to measure, unless they go through the effort of learning to measure without the tape.

You can argue that measuring tape is a crutch preventing people from learning how to properly measure, and has its own limitations, but regardless its still really helpful, especially for people who only need to measure things occassionally, and not super long things.

ChatGPT is a tool. Just like all other tools, like computers, cars, etc., if you take it away, most people cannot perform the function for which they relied on the tool to help them do.

By @sim7c00 - 8 months

why is this surprising. all such tools hamper learning. if you want to learn, read books, read and write. don't use a spellchecker for ur language exam. no calculator for calculus. pen and paper. how is this going backwards :(

By @ignoramous - 8 months

The title could be worded better. Kids using "base" GPT4 performed poorly but the ones with access to a finely-tuned "tutor" GPT4 did okay. The study was purposefully done in a domain the current SoTA LLMs struggle in (Math).

From the (draft!) paper's abstract:

  A key remaining question is how generative AI affects learning, namely, how humans acquire new skills as they perform tasks. This kind of skill learning is critical to long-term productivity gains, especially in domains where generative AI is fallible and human experts must check its outputs.
  ..
  Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor).
  However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base). That is, access to GPT-4 can harm educational outcomes.
  These negative learning effects are largely mitigated by the safeguards included in GPT Tutor.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486

By @langsoul-com - 8 months

Not surprising. Test is all about memorising things. If you don't need to memorise everything because it's on google, you won't.

Thus, when the test rolls around, nothing is memorised and then they do bad.

It's like memorising phone numbers VS keeping in the contacts app. Before I memorised tons of numbers, but now they're all on the app and I barely recall my own.

By @ggm - 8 months

I was willing to entertain the idea they could do better. I guess the tests have to be written to leverage the skill.

That said, all things being equal kids who write notes by hand out-perform kids who type them. Even touch type them. So maybe the old ways are better in this specific brain-knowledge-competency-understanding forming space?

By @JeremyNT - 8 months

It seems kind of obvious, no?

The act of repetition and processing the data ourselves is what leads to a deeper understanding, and asking a chatbot for an answer seems like it would skip the thinking required when learning "the old fashioned way."

Maybe we can learn how to incorporate using chatbots in education, but I suspect there need to be guardrails on when and how they are used so students can get the benefit of doing the work themselves.

By @golergka - 8 months

> A third group of students had access to a revised version of ChatGPT that functioned more like a tutor. This chatbot was programmed to provide hints without directly divulging the answer. The students who used it did spectacularly better on the practice problems, solving 127 percent more of them correctly compared with students who did their practice work without any high-tech aids.

Is it me, or is does this directly contradicts the title?

By @makk - 8 months

What if the test is irrelevant to the current times?

“Those with ChatGPT solved 48 percent more of the practice problems correctly, but they ultimately scored 17 percent worse on a test of the topic that the students were learning.”

So, in the real world, where people can use chatgpt in their jobs, the kids that use it will do better than the kids who don’t.

Maybe a better test is: can you catch chatgpt when it is wrong? Not, can you answer without ChatGPT?

By @alabhyajindal - 8 months

I recently used AI assistants for help with programming homework. My usual prompts include "help me think in the right direction", "is my thinking correct" etc. I also find myself copy pasting a question in chat to understand it better.

I had the suspicion that this is not aiding in my learning process even though I am able to "solve" more problems. Nice to see this confirmed. Time to stop!

By @fzeindl - 8 months

Side note and blog promotion: I find fascinated that ChatGPT can easily simulate the age of child when giving answers for homework: https://www.fabianzeindl.com/posts/chatgpt-simulating-agegro...

By @jdeaton - 8 months

Why is the study specifically of Turkish students

By @thatkid02 - 8 months

its like gulping food; one has to chew. Time to learn how to educate when knowledge is at your fingertips.

By @gitroom - 8 months

What were the primary reasons that made students who used ChatGPT do poorly on math assessments, even though they had worked correctly through a greater number of practice problems?

By @BlueTemplar - 8 months

> A draft paper about the experiment was posted on the website of SSRN, formerly known as the Social Science Research Network, in July 2024. The paper has not yet been published in a peer-reviewed journal and could still be revised.

Should have started with that.

A study without independent replication hardly counts as «researchers found», much less one that hadn't even been peer-reviewed yet !

By @seeg - 8 months

I think the problem that people don't see anymore is using tests themselves. A clever idea is worth more than a single tick in the correct checkbox. This applies to maths as well. Tests are faster to check and, supposedly, objective, but a viva voce exam is still superior imho.

By @tj444 - 8 months

The evaluation method is wrong.

It's like when cars first came out, you ask people to drive cars for a month and they get used to cars. Then you ask them to compete in a horse race and see how fast they can go.

We should evaluate how fast they solve a problem, no matter how.

By @ern - 8 months

I use ChatGPT 4o to check my child's homework, but I forbid them from using it directly. That way, I can make sure the work is correct (or at least wrong in the same way as ChatGPT) without straining my tired brain.

By @wiradikusuma - 8 months

Kids who have their parents do their homework do worse on tests.

s/parents/chatgpt

By @tmaly - 8 months

I think the way to us ChatGPT is to have it explain a concept once and give a few examples.

After that, the student should struggle the old fashion way with problems.

I would like to see a study that looks at this approach.

By @alberth - 8 months

I have wondered if future generations will struggling with critical thinking / problem solving - without the aided technology assistance.

By @valval - 8 months

I wouldn’t be terribly concerned about that, as testing as it’s done in school is a moronic practice to begin with.

By @plusfour - 8 months

Test results are a measure of how well you can do on tests.

By @darthrupert - 8 months

Did nobody read the article? It says right there that the students who used chatgpt right, as a tutor, did much better than their peers.

If your human tutors just give you the answers when you ask for them, how do you think it'll ho?

By @jgb1984 - 8 months

I have a visceral dislike, even hate, for what the LLM hype brought the world. The never ending slop it is spouting, filling up the entire internet. More and more I get confronted with images and media that turn out to be AI generated, when I find out I am disgusted and just close the tab.

Soulless drivel, endlessly streaming.

And I'm confident that the education system as we know it will be severely damaged because of it.

Even in our own field, I can guarantee you that software developers that "grew up" with these garbage AI assistants will be worse coders than the generation that came before. You will never develop the understanding, the insight, that's needed by chatgpt'ing your way through college and life.

Excellent news for my own market value of course, but I don't hesitate to say that I regret the LLM hype happened, the impact on the world is overwhelmingly negative (not even touching on the catastrophic environmental and financial cost to society).

By @brianhama - 8 months

But better in life…

By @godelski - 8 months

Are people not reading the article here?

Let me tldr:

  - Study had 3 groups: normal GPT, system prompt to make GPT act as tutor and focus on giving hints, not answers, and no GPT 
  Group 1 (normal GPT)
    - 48% better on practice problems
    - 17% worse on test
  Group 2 (tutor GPT)
    - 127% better on practice problems 
    - equal test score to control group 

  GPT errors:
    - 50% error rate
      - 8% error on arithmetic problems
      - step by step instructions we're wrong 42% of time
    - GPT tutor was fed answers
  - students with GPT and GPT tutor predicted that they did better (so both groups were over confident)

Paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486

I'll reply with my opinion to this comment. But many comments are not responding to the article content

By @hilbert42 - 8 months

I learned the hard way: no pain, no gain.

By @namaria - 8 months

I liken using llms for 'studying' to going to a gym with a hydraulic lift.

Yeah you'll lift much, much more. But is that the point?

By @Jiahang - 8 months

learning need focus

By @JSDevOps - 8 months

Yeah because it lies and makes stuff up to fill in any gaps.

By @DeepYogurt - 8 months

Gasp

AI can beat real university students in exams, study suggests

How Good Is ChatGPT at Coding, Really?

Can ChatGPT do data science?

Why AI is no substitute for human teachers

A study from the Wharton School found high school students using generative AI for math prep perform worse on exams, highlighting the need for guidance and the importance of human teachers.

Kids who use ChatGPT as a study assistant do worse on tests

Related

AI can beat real university students in exams, study suggests

How Good Is ChatGPT at Coding, Really?

Can ChatGPT do data science?

Why AI is no substitute for human teachers

AI cheating is getting worse. Colleges Still Don't Have a Plan

Related

AI can beat real university students in exams, study suggests

How Good Is ChatGPT at Coding, Really?

Can ChatGPT do data science?

Why AI is no substitute for human teachers

AI cheating is getting worse. Colleges Still Don't Have a Plan