Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

ekZepp@lemmy.world · 4 months ago

Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

slacktoid@lemmy.ml · 4 months ago

We need a comparison against an average coder. Some fucking baseline ffs.

hayes_@sh.itjust.works · 4 months ago

Why would we compare it against an average coder?

ChatGPT wants to be a coding aid/reference material. A better baseline would be the top rated answer for the question on stackoverflow or whether the answer exists on the first 3 Google search results.

BleatingZombie@lemmy.world · 4 months ago

Or a textbook’s explanation

anachronist@midwest.social · 4 months ago

“Self driving cars will make the roads safer. They won’t be drunk or tired or make a mistake.”

Self driving cars start killing people.

“Yeah but how do they compare to the average human driver?”

Goal post moving.

Veraxus@lemmy.world · edit-2 4 months ago

I’m surprised it scores that well.

Well, ok… that seems about right for languages like JavaScript or Python, but try it on languages with a reputation for being widely used to write terrible code, like Java or PHP (hence having been trained on terrible code), and it’s actively detrimental to even experienced developers.

haui@lemmy.giftedmc.com · 4 months ago

The interesting bit for me is that if you ask a rando some programming questions they will be 99% wrong on average I think.

Stack overflow still makes more sense though.

Ech@lemm.ee · 4 months ago

For the upteenth time - an llm just puts words together, it isn’t a magic answer machine.

Naminreb@kbin.social · 4 months ago

A parrot blabbing the theory of relativity doesn’t make it Einstein.

walter_wiggles@lemmy.nz · 4 months ago

Yeah but it’s just going to get better at magicking. Soon all us wizards will be out of a job…

zero_spelled_with_an_ecks@programming.dev · 4 months ago

Just as soon as we no longer need to drive.

chiisana@lemmy.chiisana.net · 4 months ago

Self driving cars need to convince regulators that they’re safe enough, even if assuming they master the tech.

LLMs has already convinced our bosses that we are expendable, and can drastically reduce cost centres for their next earnings call.

Melkath@kbin.social · 4 months ago

Developing with ChatGPT feels bizzarely like when Tony Stark invented a new element with Jarvis’ assistance.

It’s a prolonged back and forth, and you need to point out the AIs mistakes and work through a ton of iterations to get something that is close enough that you can tweak it and use, but it’s SO much faster than trawling through Stack Overflow or hoping someone who knows more than you can answer a post for you.

elgordio@kbin.social · 4 months ago

Yeah if you treat it is a junior engineer, with the ability to instantly research a topic, and are prepared to engage in a conversation to work toward a working answer, then it can work extremely well.

Some of the best outcomes I’ve had have needed 20+ prompts, but I still arrived at a solution faster than any other method.

Melkath@kbin.social · 4 months ago

In the end, there is this great fear of “the AI is going to fully replace us developers” and the reality is that while that may be a possibility one day, it wont be any day soon.

You still need people with deep technical knowledge to pilot the AI and drive it to an implemented solution.

AI isnt the end of the industry, it has just greatly sped up the industry.

SpicyLizards@reddthat.com · 4 months ago

I would make some 1000 monkeys with typewriters comment, but I see what most actual contracted devs produce…

Epzillon@lemmy.ml · 4 months ago

I worked for a year developing in Magento 2 (an open source e-commerce suite which was later bought up by Adobe, it is not well maintained and it just all around not nice to work with). I tried to ask some Magento 2 questions to ChatGPT to figure out some solutions to my problems but clearly the only data it was trained with was a lot of really bad solutions from forum posts.

The solutions did kinda work some of the times but the way it was suggesting it was absolutely horrifying. We’re talking opening so many vulnerabilites, breaking many parts of the suite as a whole or just editing database tables. If you do not know enough about the tools you are working with implementing solutions from ChatGPT can be disasterous, even if they end up working.

crossmr@kbin.social · 4 months ago

The best method I’ve found for using it is to help you with languages you may have lost familiarity in and to walk it through what you need step by step. This lets you evaluate it’s reasoning. When it gets stuck in a loop:

Try A!
Actually A doesn’t work because that method doesn’t exist.
Oh sorry Try B!
Yeah B doesn’t work either.
You’re right, so sorry about that, Try A!
Yeah… we just did this.

at that point it’s time to just close it down and try another AI.

0x01@lemmy.ml · 4 months ago

I’m a 10 year pro, and I’ve changed my workflows completely to include both chatgpt and copilot. I have found that for the mundane, simple, common patterns copilot’s accuracy is close to 9/10 correct, especially in my well maintained repos.

It seems like the accuracy of simple answers is directly proportional to the precision of my function and variable names.

I haven’t typed a full for loop in a year thanks to copilot, I treat it like an intent autocomplete.

Chatgpt on the other hand is remarkably useful for super well laid out questions, again with extreme precision in the terms you lay out. It has helped me in greenfield development with unique and insightful methodologies to accomplish tasks that would normally require extensive documentation searching.

Anyone who claims llms are a nothingburger is frankly wrong, with the right guidance my output has increased dramatically and my error rate has dropped slightly. I used to be able to put out about 1000 quality lines of change in a day (a poor metric, but a useful one) and my output has expanded to at least double that using the tools we have today.

Are LLMs miraculous? No, but they are incredibly powerful tools in the right hands.

Don’t throw out the baby with the bathwater.

TrickDacy@lemmy.world · 4 months ago

Refreshing to see a reasonable response to coding with AI. Never used chatgpt for it but my copilot experience mirrors yours.

I find it shocking how many developers seem to think so many negative thoughts about it programming with AI. Some guy recently said “everyone in my shop finds it useless”. Hard for me to believe they actually tried copilot if they think that

raspberriesareyummy@lemmy.world · 4 months ago

I’m a 10 year pro,

You wish. The sheer idea of calling yourself a “pro” disqualifies you. People who actually code and know what they are doing wouldn’t dream of giving themselves a label beyond “coder” / “programmer” / “SW Dev”. Because they don’t have to. You are a muppet.

figaro@lemdro.id · 4 months ago

Hey! So you may have noticed that you got downvoted into oblivion here. It is because of the unnecessary amount of negativity in your comment.

In communication, there are two parts - how it is delivered, and how it is received. In this interaction, you clearly stated your point: giving yourself the title of pro oftentimes means the person is not a pro.

What they received, however, is far different. They received: ugh this sweaty asshole is gatekeeping coding.

If your goal was to convince this person not to call themselves a pro going forward, this may have been a failed communication event.

raspberriesareyummy@lemmy.world · 4 months ago

while your measured response is appreciated, I hardly consider a few dozen downvotes relevant, nor do I care in this case. It’s telling that those who did respond to my comment seem to assume I would consider myself a “pro” when that’s 1) nothing I said and 2) it should be clear from my comment that I consider the expression cringy. Outside memeable content, only idiots call themselves a “pro”. If something is my profession, I could see someone calling themselves a “professional <whatever>” (not that I would use it), but professional has a profoundly distinct ring to it, because it also refers to a code of conduct / a way to conduct business.

“I’m a pro” and anything like it is just hot air coming from bullshitters who are mostly responsible for enshittification of any given technology.

TrickDacy@lemmy.world · 4 months ago

A lot of rage for a small amount of confidence

chiisana@lemmy.chiisana.net · 4 months ago

Here we observe a pro gatekeeper in their natural habitat…

Gsus4@mander.xyz · 4 months ago

elon?

LyD@lemmy.ca · 4 months ago

On the other hand, using ChatGPT for your Lemmy comments sticks out like a sore thumb

FaceDeer@fedia.io · 4 months ago

If you’re careless with your prompting, sure. The “default style” of ChatGPT is widely known at this point. If you want it to sound different you’ll need to provide some context to tell it what you want it to sound like.

Or just use one of the many other LLMs out there to mix things up a bit. When I’m brainstorming I usually use Chatbot Arena to bounce ideas around, it’s a page where you can send a prompt to two randomly-selected LLMs and then by voting on which gave a better response you help rank them on a leaderboard. This way I get to run my prompts through a lot of variety.

EatATaco@lemm.ee · 4 months ago

Anyone who claims llms are a nothingburger is frankly wrong,

Exactly. When someone says that it either indicates to me that they ignorant (like they aren’t a programmer or haven’t used it) or they are a programmer who has used it, but are not good at all at integrating new tools into their development process.

Don’t throw out the baby with the bathwater.

Yup. The problem I see now is that every mistake an ai makes is parroted over and over here and held up as an example of why the tech is garbage. But it’s cherry picking. Yes, they make mistakes, I often scratch my head at the ai results from Google and know to double check it. But the number of times it has pointed me in the right direction way faster than search results has shown to me already how useful it is.

nephs@lemmygrad.ml · 4 months ago

Omg, I feel sorry for the people cleaning up after those codebases later. Maintaing that kind of careless “quality” lines of code is going to be a job for actual veterans.

And when we’re all retired or dead, the whole world will be a pile of alien artifacts from a time when people were still able to figure stuff out, and llms will still be ridiculously inefficient for precise tasks, just like today.

https://youtu.be/dDUC-LqVrPU

Specal@lemmy.world · 4 months ago

I’ve found that the better I’ve gotten at writing prompts and giving enough information for it to not hallucinate, the better answers I get. It has to be treated as what it is, a calculator that can talk, make sure it has all of the information and it will find the answer.

One thing I have found to be super helpful with GPT4o is the ability to give it full API pages so it can update and familiarise it’s self with what it’s working with.

MajorHavoc@programming.dev · 4 months ago

As a fellow pro, who has no issues calling myself a pro, because I am…

You’re spot on.

The stuff most people think AI is going to do - it’s not.

But as an insanely convenient auto-complete, modern LLMs absolutely shine!

sylver_dragon@lemmy.world · 4 months ago

I think AI is good with giving answers to well defined problems. The issue is that companies keep trying to throw it at poorly defined problems and the results are less useful. I work in the cybersecurity space and you can’t swing a dead cat without hitting a vendor talking about AI in their products. It’s the new, big marketing buzzword. The problem is that finding the bad stuff on a network is not a well defined problem. So instead, you get the unsupervised models faffing about, generating tons and tons of false positives. The only useful implementations of AI I’ve seen in these tools actually mirrors you own: they can be scary good at generating data queries from natural language prompts. Which is, once again, a well defined problem.

Overall, AI is a tool and used in the right way, it’s useful. It gets a bad rap because companies keep using it in bad ways and the end result can be worse than not having it at all.

dgmib@lemmy.world · 4 months ago

Sometimes ChatGPT/copilot’s code predictions are scary good. Sometimes they’re batshit crazy. If you have the experience to be able to tell the difference, it’s a great help.

EatATaco@lemm.ee · 4 months ago

Due to confusing business domain terms, we often name variables the form of XY and YX.

One time copilot autogenerated about two hundred lines of a class that was like. XY; YX; XXY; XYX; XYXY; … XXYYXYXYYYXYXYYXY;

It was pretty hilarious.

But that being said, it’s a great tool that has definitely proven to worth the cost…but like with a co-op, you have to check it’s work.

fossilesque@mander.xyz · 4 months ago

I find the mistakes it makes and trouble shooting them really good for learning. I’m self taught.

Potatos_are_not_friends@lemmy.world · 4 months ago

Pretty much this. Experienced developers see AI just as a next level lorem Ipsum.

jsomae@lemmy.ml · 4 months ago

Sure, but by randomly guessing code you’d get 0%. Getting 48% right is actually very impressive for an LLM compared to just a few years ago.

InvaderDJ@lemmy.world · 4 months ago

You can also play with it to try and get closer to correct. I had problems with getting an Excel macro working and getting unattended-updates working on my pihole. GPT was wrong at first, but got me partly there and I could massage the question and Google and get closer to the right answer. Without it, I wouldn’t have been able to get any of it, especially with the macro.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 4 months ago

Exactly, I also find that it tends to do a pretty good job pointing you in the right direction. It’s way faster than googling or going through sites like stackoverflow because the answers are contextual. You can ask about a specific thing you want to do, and and an answer that gives you a general idea of what to do. For example, I’ve found it to be great for crafting complex sql queries. I don’t really care if the answer is perfect, as long as it gives me an idea of what I need to do.

xthexder@l.sw0.com · 4 months ago

Just useful enough to become incredibly dangerous to anyone who doesn’t know what they’re doing. Isn’t it great?

jsomae@lemmy.ml · 4 months ago

Now non-coders can finally wield the foot-gun once reserved only for coders! /s

Truth be told, computer engineering should really be something that one needs a licence to do commercially, just like regular engineering. In this modern era where software can be ruinous to someone’s life just like shoddy engineering, why is it not like this already.

iopq@lemmy.world · 4 months ago

Look, nothing will blow up if I mess up my proxy setup on my machine. I just won’t have internet until I revert my change. Why would that be different if I were getting paid for it?

cows_are_underrated@discuss.tchncs.de · 4 months ago

Nothing happens if you fuck up your proxy, but if you develop an app that gets very popular and don’t care about safety, so hackers are able to take control over your whole Server they can do a lot of damage. If you develop software for critical infrastructure it can actually cost human lives if you fuck up your security systems.

iopq@lemmy.world · 4 months ago

Yes, but people with master’s degrees also fuck this up, so it’s not like some accreditation system will solve the issue of people making mistakes

cows_are_underrated@discuss.tchncs.de · 4 months ago

Yeah, but its probably more likely that the untaught might fuck up some stuff.

sajran@lemmy.ml · 4 months ago

Setting up proxy is not engineering.

iopq@lemmy.world · 4 months ago

I have to actually modify the code to properly package it for my distro, so it’s engineering because I have to make decisions for how things work

sajran@lemmy.ml · 4 months ago

I don’t see how this supports your point then. If “setting up proxy” means “packaging it to run on thousands user machines” then isn’t there obvious and huge potential for a disastrous fuckup?

THCDenton@lemmy.world · 4 months ago

It was pretty good for a while! They lowered the power of it like immortan joe. Do not be come addicted to AI

dullbananas (Joseph Silva)@lemmy.ca · 4 months ago

If you become addicted to ChatGPT then that makes you a cloud cyborg

Crisps@lemmy.world · 4 months ago

In the short term it really helps productivity, but in the end the reward for working faster is more work. Just doing the hard parts all day is going to burn developers out.

birbs@lemmy.world · 4 months ago

I program for a living and I think of it more as doing the interesting tasks all day, rather than the mundane and repetitive. Chat GPT and GitHub Copilot are great for getting something roughly right that you can tweak to work the way you want.

katy ✨@lemmy.blahaj.zone · 4 months ago

ill use copilot in place of most of the times ive searched on stackoverflow or to do mundane things like generate repeated things but relying solely on it is the same as relying solely on stackoverflow.

floofloof@lemmy.ca · edit-2 4 months ago

What’s especially troubling is that many human programmers seem to prefer the ChatGPT answers. The Purdue researchers polled 12 programmers — admittedly a small sample size — and found they preferred ChatGPT at a rate of 35 percent and didn’t catch AI-generated mistakes at 39 percent.

Why is this happening? It might just be that ChatGPT is more polite than people online.

It’s probably more because you can ask it your exact question (not just search for something more or less similar) and it will at least give you a lead that you can use to discover the answer, even if it doesn’t give you a perfect answer.

Also, who does a survey of 12 people and publishes the results? Is that normal?

brbposting@sh.itjust.works · 4 months ago

I have 13 friends who are researchers and they publish surveys like that all the time.

(You can trust this comment because I peer reviewed it.)

B0rax@feddit.de · 4 months ago

Even this Lemmy thread has more participants than the survey