July 14th, 2024

Can ChatGPT do data science?

A study led by Bhavya Chopra at Microsoft, with contributions from Ananya Singha and Sumit Gulwani, explored ChatGPT's challenges in data science tasks. Strategies included prompting techniques and leveraging domain expertise for better interactions.

Read original articleLink Icon
Can ChatGPT do data science?

ChatGPT's ability to assist in data science tasks was explored in a study led by Bhavya Chopra at Microsoft, with contributions from collaborators like Ananya Singha and Sumit Gulwani. The study highlighted challenges faced by data scientists when using ChatGPT, such as difficulties in providing context, false assumptions made by the AI, and misaligned expectations in responses. Data scientists struggled with adapting ChatGPT's responses to their tasks, dealing with repeated code generation, and validating the generated code. Strategies to overcome these challenges included various prompting techniques, leveraging domain expertise, and considering alternative resources. Recommendations for designing AI-powered data science tools included providing preemptive context selection interfaces, inquisitive feedback loops, and transparent mechanisms for sharing context and domain expertise. The study emphasized the need for improved tools to facilitate efficient interactions between data scientists and AI assistants in the data science domain.

Link Icon 4 comments
By @simonw - 3 months
I read this post (and then the paper) scratching my head over two details: which OpenAI model was this using (it just said "ChatGPT") and was it using ChatGPT's data analysis mode, aka ChatGPT Code Interpreter.

I finally found this tucked away towards the end of the paper:

"Browser-based version of ChatGPT version was updated as we conducted our task-based studies. P1-P9 used the March 14, 2023 version of ChatGPT, and P10-P14 used the March 23, 2023 version."

GPT-4's release date was March 14th 2023 - but only for paid users. Was this study using GPT-4 or GPT-3.5? That's still not clear to me.

Even more importantly: was it using Code Interpreter / data analysis mode? https://help.openai.com/en/articles/8437071-data-analysis-wi...

I'm almost certain it was not - none of the screenshots showed evidence of that mode, and that feature was made available (as part of the plugins beta) to beta testers on March 23rd - it's weird that the dates mentioned in the paper happen to coincide exactly with the GPT-4 and the Code Interpreter preview dates!

Without including data analysis mode I don't think this paper addresses the question "can ChatGPT do data science?". That's what that 15 month old feature is for, and any study that doesn't include it is missing the most important factor in answering that question.