June 26th, 2024

Why We're Deeply Invested in Making AI Better at Math Tutoring

Khan Academy is advancing AI for math tutoring with Khanmigo, aiming to mimic human tutors. Despite some errors, efforts continue to improve tutoring with tools like calculators, GPT-4 Turbo, and GPT-4o models. They prioritize enhancing AI's tutoring capabilities and sharing insights with the education community.

Read original articleLink Icon
Why We're Deeply Invested in Making AI Better at Math Tutoring

Khan Academy is deeply invested in improving AI for math tutoring to help students struggling with math concepts. Their pilot AI tutor, Khanmigo, aims to guide students through learning processes like a human tutor would. Despite occasional mistakes, efforts are ongoing to enhance Khanmigo's tutoring abilities and evaluation of student work. Recent improvements include using a calculator for numerical problem-solving, upgrading to a more advanced language model (GPT-4 Turbo), and exploring new models like GPT-4o. The team is also focusing on enhancing AI's thought process during tutoring sessions and sharing insights with the education community. Khan Academy remains committed to refining AI tutoring to support students in achieving their academic goals, not only in math but also in humanities. The organization's dedication to leveraging AI for education underscores their mission to provide free, high-quality learning opportunities globally.

Related

Some Thoughts on AI Alignment: Using AI to Control AI

Some Thoughts on AI Alignment: Using AI to Control AI

The GitHub content discusses AI alignment and control, proposing Helper models to regulate AI behavior. These models monitor and manage the primary AI to prevent harmful actions, emphasizing external oversight and addressing implementation challenges.

Lessons About the Human Mind from Artificial Intelligence

Lessons About the Human Mind from Artificial Intelligence

In 2022, a Google engineer claimed AI chatbot LaMDA was self-aware, but further scrutiny revealed it mimicked human-like responses without true understanding. This incident underscores AI limitations in comprehension and originality.

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

Francois Chollet – LLMs won't lead to AGI – $1M Prize to find solution [video]

The video discusses limitations of large language models in AI, emphasizing genuine understanding and problem-solving skills. A prize incentivizes AI systems showcasing these abilities. Adaptability and knowledge acquisition are highlighted as crucial for true intelligence.

Hackers 'jailbreak' powerful AI models in global effort to highlight flaws

Hackers 'jailbreak' powerful AI models in global effort to highlight flaws

Hackers exploit vulnerabilities in AI models from OpenAI, Google, and xAI, sharing harmful content. Ethical hackers challenge AI security, prompting the rise of LLM security start-ups amid global regulatory concerns. Collaboration is key to addressing evolving AI threats.

Mozilla roll out first AI features in Firefox Nightly

Mozilla roll out first AI features in Firefox Nightly

Mozilla is enhancing Firefox with AI features like local alt-text generation for images in PDFs. Users can access various AI services for tasks, promoting user choice, privacy, and a personalized internet experience.

Link Icon 24 comments
By @k8sagic - 7 months
Great!

I failed in university at math. Why? Because the tutors had not the time to help me. My level of math was not the same as the other students as i was not in the math part of a gymnasium.

I struggled and wasted a lot of time and energy to even find good explanations.

And when i had a math group, one girl was super nice but knew so muchmore than i did because of her math in gym. Professors asumed so much knowledge and no one cared to try to help people.

Best help were people from india on youtube with bad english.

And the most ridiculous part: Every year around the globe people teach this level of university math to probably millions of students. We should have the perfect free educational platform which teaches everyone perfectly already because so many tutors and professors lecture on the same topics over and over and over again. Our educational system is a joke.

By @JustinSkycak - 7 months
Conversational dialogue seems like a fascinating distraction.

Many people who have (unsuccessfully) attempted to apply AI to education have focused too much on the "explanation" part and not enough on scaffolding, navigating, and managing the entire learning process. It’s easy to go on a wild goose chase building an explanation AI.

You fall in love with the idea of AI having conversational dialogue with students, and then you get lost in the weeds of complexity. You solve just enough of the problem to produce a cool demo, yet you're still hopelessly far away from self-service learning in real life.

I don't think conversational dialogue is even necessary.

What we do at mathacademy.com is hard-code explanations and break them up into bite-size pieces that are served at just the right moment. And we close the feedback loop by having students solve problems, which they need to do anyway. (The student's "response" is whether they got the problem correct.)

Sure, hard-coding explanations feels tedious, takes a lot of work, and isn't "sexy" like an AI that generates responses from scratch – but at least it's not a pipe dream. It's a practical solution that lets us move on to other components of the AI that are just as important.

What are those other components? A handful off the top of my head:

* After a minimum effective dose of explanation, the AI needs to switch over to active problem-solving. Students should begin with simple cases and then climb up the ladder of difficulty, covering all cases that they could reasonably be expected to solve on a future assessment.

* Assessments should be frequent and broad in coverage, and students should be assigned personalized remedial reviews based on what they answered incorrectly.

* Students should progress through the curriculum in a personalized mastery-based manner, only being presented with new topics when they have (as individuals, not just as a group) demonstrated mastery of the prerequisite material.

* After a student has learned a topic, they should periodically review it using spaced repetition, a systematic way of reviewing previously-learned material to retain it indefinitely into the future.

* If a student ever struggles, the system should not lower the bar for success on the learning task (e.g., by giving away hints). Rather, it should strengthen a student’s area of weakness so that they can clear the bar fully and independently on their next attempt.

By @endisneigh - 7 months
The irony is that the world where AI could reasonably explain and teach any level of math is the same world where most of such graduates would be unemployed anyways.

You’re already seeing oversupply of educated folks in India and China. Graduate students working as baristas in USA, etc. Sadly inevitable.

In any case it’s good to have the resources I suppose.

I’m also fundamentally skeptical of these stats around math struggling. What percentage of kids who would be very likely to use Khan Academy are struggling with math? And what percentage who are not would even use khan academy to begin with?

Most of the problems with students doing poorly are sadly societal - not to say that this isn’t useful, though.

By @LightFog - 7 months
Maybe I’m reading too much into it but the roadmap mentioning switching from GPT4 Turbo to 4-o and hoping for better math performance feels like they are betting on a significant near term reliability improvement in LLMs without any other real plans. That magic jump is starting to look more and more doubtful by the day.
By @t_mann - 7 months
> Khanmigo now uses a calculator to solve numerical problems instead of using AI’s predictive capabilities. If you’ve been using Khanmigo recently, you may have seen that it will sometimes say it is “doing math.” This is when the math problem is running through the calculator behind the scenes.

> We’ve upgraded parts of Khanmigo to a more capable large language model, which is the software that generates human language. The more capable large language model is called GPT-4 Turbo. Our internal testing shows an improvement in math after we made the switch.

> We are beginning to test the capabilities of a new large language model called GPT-4o, and we’re evaluating other models too to see if they are stronger at math.

> We’ve improved the way AI “thinks” during a tutoring session before responding to a student. We have instructed the AI to write out all the ways in which the student may have arrived at their answer. This approach mimics how a tutor in real life works with a student. We’ve found it significantly improves the quality of math interactions.

> We’ve built new tools to track our progress on math.

> We’re sharing math examples and learnings with others in our field so that we can learn from each other.

> We’re studying the latest research papers on math performance.

Sounds like most of what they're doing is related to prompting, chain-of-thought reasoning and similar, on top of a 'vanilla' foundation model. Sounds like something an ambitious student could replicate / improve upon, so given their mission, it'd be cool if they published the exact techniques they're using and their benchmark results.

By @itissid - 7 months
Given its 4$ per month[1] and they are a non-profit with just 55M$ revenue. How is GPU cost and hence their with OpenAI for this going to work if it becomes really-good and 100's of millions of kids start using it?

Renting a 24GB VRAM Runpod is ~0.5$ per hour. How can the math work out unless you have to have a non-profit energy company and a server farm attached?

[1] https://osboncapital.com/khanmigo-ai-public-vs-private/ [2] https://www.runpod.io/gpu-instance/pricing

By @light_hue_1 - 7 months
I've taught math at every level. From volunteering to teach it to grade school kids, to getting paid for highschool tutoring, and at the university level both for undergrads and now with my own grad students.

I tried Khanmigo. It is counterproductive junk.

It simply doesn't understand what a student knows and it has to idea how to give examples or provide perspective shifts that help students. That's what a good tutor does. It gets stuck explaining the same things in the same ways but with different words over and over again.

It has no idea how to think geometrically vs algebraically and how to switch between the two. And it can't carry out even simple proofs.

By @empath75 - 7 months
The fun thing about getting math help from the ai is that okay, sure it can explain how fractions work to a third grader, but it can also explain how fiber bundles work to a graduate student. Maybe it gets the details wrong, but like even if it's wrong sometimes, it's far better than googling or wikipedia or even a text book sometimes, because you can interrogate it interactively and ask for clarifications. Yes, a tutor or a teacher is going to be better, but not everyone has access to a math expert.
By @FredPret - 7 months
The Young Lady’s Illustrated Primer cometh
By @itissid - 7 months
One issue that traditional AI Agent reasoning built on Theory of Mind(ToM) (ToM: I know what you are thinking, so I can predict what you will do next.) was hard to get right was because the domain model and interface was quite tough to design. Agents did not fully understand and model the belief state of the person. But more importantly, there was possibly also no way to quickly _generate_ lots of plans and test them in a Domain independent manner (RHLF is horribly inefficient for this). Now there are AI agents are great at generating plans (examples of what could work) that can be tested with kids and creators and provide feedback to tune not just LLMs but also add to the Domain models. This can solve a lot of interesting problems that have plagued tutoring systems:

1. Improve sample efficiency of generated plans to get to the right goal while meeting all the subgoals for a plan.

2. Enrich engineered and correct neuro-symbolic models of reasoning with better knowledge representations.

3. Quantitative psychometric systems that use metrics like proficiency could only be as good as the underlying “factor” model and hypothesis, but one could test multiple more interesting hypothesis more quickly: "Are you mentally tiered?", "Here is a hint: ___. Does that help?".

By @photochemsyn - 7 months
If you want good math teachers, you have to make attractive offers that will draw in the best talent. So, let's do some math:

The median price of an existing single-family home in the Califoria Bay Area in February 2024 was $1.25 million. The range of teacher salaries in the Bay Area is $50K - $100K. A reasonable downpayment on that home would be ~$250,000, followed by yearly payments in the range $64K (30-year) to 80K (15-year). Following the 'housing should consume 30% of income' - allowing teachers to have children and buy a car and go on vacations and so on - this would require salaries of $200K - $250K per year under current market conditions. Thus, anyone wanting their own home and a family will probably not be considering a career in teaching - you'd end up as a lifelong two-room apartment renter.

AI tutoring is a great idea - but a tutor isn't a teacher. The job of the teacher is to excite and motivate the students, provide overall guidance and appropriately difficult problem sets for the students to work on. The one-one-one tutor is there to help with the problem sets, especially when students get stuck.

Unfortunately, here in the USA we have a system that values the CEO of McDonalds - which delivers very unhealthy food to the population - at $19 million per year - and teachers are valued at the same rate as garbage truck operators (at least in the Bay Area).

The result is that most students entering higher education need one-two years of remedial math education before being able to grasp the higher math, e.g. vector calculus, linear algebra, complex analysis etc., which is needed for modern engineering and scientific disciplines. This results in competitive disadvantage relative to China and other countries that invest heavily in childhood education.

By @1024core - 7 months
> In fourth grade, only 36% of students are proficient in math. By eighth grade, that number drops to 26%.

Followed by:

> As we come to the end of our first full pilot school year,

Did they not monitor students' performance? Do they not have concrete numbers like the ones quoted in the first paragraph? What better way to demonstrate effectiveness than to evaluate students and see how much improvement they made?

By @djaouen - 7 months
At least they are using it to teach math and not to tell us we don't need to study math anymore!
By @dougb5 - 7 months
> As we come to the end of our first full pilot school year, we’re enthusiastic about Khanmigo’s ability to tutor in math

Are there RCT results (or any results) showing the impact on students?

By @raytopia - 7 months
I can't find a link right now but Salman Khan (the founder of Khan Academy) is a believer that the future of education is 1 teacher for 1 student so it makes sense why they'd be investing so heavily in AI tutors.
By @bilsbie - 7 months
Fun, learning games! I think there’s also so much untapped potential there.

Think how much you learned playing sim city (assuming it was realistic). Imagine effortlessly learning any skill by having fun.

By @HPsquared - 7 months
There's a conflict between those who believe the "content" is the main thing in education, or as a sports tournament with rules against "cheating".
By @linearrust - 7 months
If AI is good enough to teach math, why not have the AI 'do the math' instead? At a certain point, wouldn't we have to ask whether it makes any economic sense to teach math to the masses?

If AI is good enough to teach people to drive, then it should be good enough to drive the car itself. At that point, it would be pointless to teach the masses to drive when the AI can do it for them. If AI is good enough to cook, ... So on and so forth.

The intention is noble but where does it all lead?

By @bionhoward - 7 months
Seems disrespectful of students to write two blog posts about this without clarifying if you foist OpenAI’s explicitly anticompetitive legal terms on students.

If the OpenAI data use policy applies to Khan Academy students, they implicitly require every student user of this to not use the output to compete with OpenAI (taken literally, you can’t use what your tutor says to you for anything, since it’s always going to compete with OpenAI)

Imagine how dumb it is to say “our thing can do everything. You can use it, but you can’t use it to do anything that competes with us.”

Let that sink in. Can you name one thing you can do using the output of a thing that does everything which doesn’t compete with the everything thing?

What a shame to flush Khan academy just to hop aboard the hype train. I know I’ll probably get downvoted for being negative but to write so much without clarifying what the “Open” AI legal terms mean for students is actively harmful to student trust.

If anyone needs a reason to pity OpenAI, just remember how they betray their true name. That is a profound failure and I would not wish it on anyone.

By @leobg - 7 months
Direct Instruction. Most underrated technology since the 70s.
By @booleandilemma - 7 months
Why not be deeply invested in hiring high quality math tutors?
By @anvil-on-my-toe - 7 months
Because it's cheaper than paying people.
By @frakt0x90 - 7 months
"To increase profits" is the obvious answer to the title. Otherwise very mundane, common sense updates. Just an engagement farming article.