Developers Put AI Bots to the Test of Writing Code (2023)

One Bay Area technical support specialist told me he’d had a secret advantage when a potential employer assigned a take-home programming problem. He’d used ChatGPT to generate a solution, then turned it in as his own work.

OpenAI reminds users that its ChatGPT is currently in a free “research preview” to “learn about its strengths and weaknesses.” And there are plenty of other options to explore as well.

The last month has also seen the launch of Hugging Face’s open source alternative, “HuggingChat” — and a set of dedicated coding tools like StarCoder Playground.

With so many AI-powered assistants waiting to be explored, we’ve now entered the phase where excited users try their own homegrown experiments — and share the results online.

Engineering is going to change forever.

I just fed GPT-4-32K nearly all of Pinecone’s docs, and the results blew my mind!

It helped me make architecture decisions, and it then wrote my code for me.

The future of AI-assisted development is here, and it’s beyond impressive. pic.twitter.com/JJjF3nYiIF

— Matt Shumer (@mattshumer_) May 1, 2023

Can these new AI-powered tools really generate code? With a few qualifications and caveats, the answer appears to be yes.

Informal Tests

It’s always been the provocative question lurking behind the arrival of powerful AI systems. In early 2022 Alphabet reported its DeepMind lab for AI research had created a computer programming system called “AlphaCode” which was already ranking “within the top 54%” of the coders competing on the site Codeforces. By November GitHub was experimenting with adding a voice interface to its impressive AI-powered pair programmer, Copilot.

But now the systems are facing some more informal tests.

Last month a game developer on the “Candlesan” YouTube channel shared ChatGPT’s efforts to recreate the popular mobile game Flappy Bird. While it took several iterations, the code was fully completed in about 90 minutes. It was written in C# in the Unity game engine — and even used the AI-generated art that the developer created using Midjourney.

The video hints at a possible future where developers use AI to get their work done faster.

“What I really like about this process is that while ChatGPT is taking care of the code, I get to focus my attention on design work,” explains the video’s enthusiastic game developer. “I get to position text elements on the screen, I decide the distance between the pipes, or the exact tuning numbers for how hard the bird flaps its wings.”

And in a later video, the same developer uses ChatGPT to code bots to play the game ChatGPT just built.

Acing the Coding Test

Can AI pass a professional coding test? Other experiments suggest the answer there is also “yes” — but not every AI system. One such test appeared last month on the tech site HackerNoon, when Seattle-based full-stack developer Jorge Villegas tested GPT-4, Claude+, Bard, and GitHub Co-Pilot on a practice exercise from the coding site Leetcode.com. Villegas distilled the question down to an unambiguous five-word prompt: “Solve Leetcode 214. Shortest Palindrome.”

Leetcode’s practice puzzle #214 challenges coders to look at a string, and change it into a palindrome (the shortest possible one) by only adding letters to the front of the string. “While I could have asked follow-up questions, I chose to only consider the initial response,” Villegas added.

It’s a tricky puzzle — and the results were some hits and some misses…

  • GPT-4 wrote code that passed all of Leetcode’s tests — and even ran faster than 47% of submissions to the site by (presumably human) users. Villegas’ only caveat was that GPT-4 is slower to respond than the other sites — and that using its API “is also a lot more expensive and costs could ramp up quickly.”
  • Villegas also tested the Claude+ “AI assistant” from Anthropic, a company describing itself as “an AI safety and research company” that builds “reliable, interpretable, and steerable AI systems.” But unfortunately, the code it produced failed all but one of Leetcode’s 121 tests.
  • Google’s “experimental AI service” Bard failed all but two of Leetcode’s 121 tests. (Although Bard’s code also contained a bug so obvious that Villegas felt compelled to correct it himself: The function needed Python’s self keyword to specify a namespace for the function’s variables.)
  • Villegas tested GitHub Copilot (asking the question by typing it as a comment in Microsoft’s Copilot-powered VSCode). And it passed every one of Leetcode’s tests — scoring better than 30% of submissions (from presumably human coders).

Villegas’s essay closes with an important caveat. “It is unclear whether any of these models were pre-trained on Leetcode data.” So in early May Villegas tried another more specialized test, using a slightly longer prompt that requested four different CSS features written with a specific framework.

“Create a header component using Tailwind CSS that includes a logo on the left, navigation links in the center, and a search bar on the right. Make the header dark purple.”

The results from GPT-4 “overall looks very good” and Claude+ made “a pretty good attempt,” while for Bard’s response, “the nav links have no space between them, the search bar is illegible against the background… I guess it still got the main parts of the prompt correct, all the content is in the correct order.” And Bing’s version of GPT-4 was the only one that actually got the navigation links in the center.

Villegas’s ultimate verdict is that AI-generated code lacks context-awareness, and “often lacks attention to detail and can result in design flaws. Additionally, AI still struggles with context awareness, and it can be challenging to provide precise instructions that an AI can follow accurately.

“These difficulties demonstrate that AI cannot replace human designers entirely but can be a valuable tool to assist them in their work.”

I asked ChatGPT to write a python script that generates an image of a bird pic.twitter.com/mwd3FEHZkR

— Bruno Gavranović (@bgavran3) December 3, 2022

Plugins and PHP

ZDNet attempted some even more ambitious tests.

Senior contributing editor David Gewirtzhad used ChatGPT back in February to generate a working WordPress plugin for his wife. It randomized items on a list — though a series of additional feature requests eventually tripped it up, with ChatGPT failing to sanitize the input when calling PHP within HTML.

While Gewirtz decided this was only coding at the “good enough” level, he also noted that what many clients actually want. This led Gewirtz to conclude that AI will “almost undoubtedly” reduce the number of human programming gigs, adding that even today AI is “definitely an option for quick and easy projects… this surge in high-quality generative AI has been startling to me.”

In April he’d tried the same test using Google’s Bard, but it generated a plugin that didn’t work. It just produced blank output rather than a list of names in random order. Bard also got tripped up when asked for a simple rewrite of an input checker so it would allow decimal values as well as integers (which would allow letters and symbols to be placed to the right of the decimal). And when testing both Bard and ChatGPT on some buggy PHP code, only ChatGPT correctly identified the flaw. “For the record, I looked at all three of Bard’s drafts for this answer, and they were all wrong.”

But then Gewirtz decided to push ChatGPT to write a “hello world” program in 12 different programming languages. Gewirtz used the top 12 most popular programming languages (as ranked by O’Reilly) — Java, Python, Rust, Go, C++, JavaScript, C#, C, TypeScript, R, Kotlin, and Scala — and ChatGPT dutifully complied (even providing the appropriate syntax coloring for them all).

David Gewirtz took ChatGPT through a history of programming languages dating as far back as the 1950s. And he described the results as “cool beyond belief.”

To make things more challenging, his prompt even requested different messages for the morning, evening, and afternoon. While Gewirtz didn’t run the code, “I did read through the generated code and — for most languages — the code looked good.” And a quick test of the JavaScript code shows it does indeed perform as expected.

I am using #ChatGPT to write computer code. Game changer on the order of electricity and personal computing.

Coding is a killer app. Can’t wait for it to be integrated FULLY into most IDEs. I tried current “chat” programming integrations but they are terrible compated to…

— JM Rothberg (@JMRothberg) February 19, 2023

Just for fun, Gewirtz also asked it to produce results using the legacy Forth programming language— and it did. So then in a later article, Gewirtz challenged ChatGPT to write code in 10 morerelatively obscure languages,” including Fortran, COBOL, Lisp, Algol, Simula, RPG (Report Program Generator), IBM’s BAL (Basic Assembly Language), and Xerox PARC’s Smalltalk.

In short, Gewirtz took ChatGPT through a history of programming languages dating as far back as the 1950s. And he described the results as “cool beyond belief.” Though he didn’t run the generated code, “most look right, and show the appropriate indicators telling us that the language presented is the language I asked for…”

ChatGPT even rose to Gewirtz’s challenge of writing code in another ancient language, APL, which sometimes uses a non-standard character set — though the font used to display its code transformed them into what Villegas calls “little glyphs.” As Google explains…

Developers Put AI Bots to the Test of Writing Code (1)

But perhaps the most thought-provoking result of all came when ChatGPT generated code in equally-ancient Prolog. This is especially notable because ChatGPT is written in Prolog — at least partially. Gewirtz notes that ChatGPT uses a mode that translates Prolog logical forms into sentences in natural language.

With so many examples of AI assistants already generating code, maybe it’s time to move on to the question of how they’ll ultimately be used. That is a question we’ll watching out for in the months and years to come.

I have discovered I don't need to write full English to GPT-3.5; just a bunch of keywords is enough. E.g., "argparse fixed set of valid values for flag" or "python test equality between two tuples but report only components that differed"

— Edward Z. Yang (@ezyang) April 28, 2023

FAQs

Can chatbot write code? ›

Bard has learned a new trick. Google's AI-powered chatbot can now write, debug and even explain code in more than 20 programming languages, "one of the top requests we've received from our users," Google announced Friday.

How good is ChatGPT at writing code? ›

ChatGPT has been designed to help coders solve programming problems faster and provide correct answers for simple programs. While it can accurately identify certain fixes for errors and can provide answers to simpler programming questions, it is not able to deliver satisfying solutions to more advanced problems.

Can AI write complex code? ›

AI is not yet able to write complex codes as well as a human programmer, but it is becoming increasingly capable of completing this task. Programming a computer with artificial intelligence (Ai) allows it to make decisions on its own.

Will ChatGPT replace front end developers? ›

AI will replace at least Frontend developers

Before launching Github's co-pilot and Chatgpt, he predicted that AI would soon replace developers, at least front-end developers. He predicted that it would happen within 2030!!

Can AI write code for me? ›

Writing code with generative AI is possible through a technique known as neural code generation. This involves training a neural network on a large dataset of code examples, and then using the fine tuned network to generate code that is similar in structure and function to the examples it has been trained on.

Has chatbot stopped writing code? ›

A chatbot, as an AI language model, is not capable of writing code on its own.

What is the hardest code to write? ›

Malbolge is by far the hardest programming language to learn, which can be seen from the fact that it took no less than two years to finish writing the first Malbolge code.

How much time do developers spend actually writing code? ›

Developers code less than one hour per day.

Is writing code good for your brain? ›

Coding activates the brain's learning centers

Since coding tasks require a range of complex skills, the brain adapts to reinforce associations between distinct parts of the brain. Forming these flexible intra-brain connections is a great workout for the brain, strengthening its ability to learn, memorize, and perform.

What language is most AI coded in? ›

#1 Python. Although Python was created before AI became crucial to businesses, it's one of the most popular languages for Artificial Intelligence. Python is the most used language for Machine Learning (which lives under the umbrella of AI).

Is it possible to code an AI like Jarvis? ›

The answer is yes, and it's not as far-fetched as one may think. With the right combination of technologies and platforms, we can create an AI-powered personal assistant that can manage various aspects of our lives.

Which AI turns words to code? ›

Codex can go from text to code, taking commands written in plain English and bringing them to life. Codex can go from text to code, taking commands written in plain English and bringing them to life.

Is ChatGPT threat to programmers? ›

No, ChatGPT is not a threat to software engineers. Instead, it can be used to enhance productivity and automate repetitive and time-consuming tasks, freeing up time for engineers to focus on higher-level tasks.

Will ChatGPT replace Google? ›

ChatGPT is not replacing Google. OpenAI's chatbot is not designed to act as a search engine. It functions well as a question-answering chatbot and a personal assistant for a variety of tasks. So, if you were hoping to use ChatGPT to find your local bus schedule you may want to think again.

Will coders be replaced by AI? ›

While AI is currently being used to improve software engineering, many people fear that it could eventually replace human developers altogether. But is this really the case? The truth is that AI is unlikely to replace high-value software engineers who build complex and innovative software.

Can AI write better code than humans? ›

In simple words–AI can't write better code than humans. But it's also important to understand that this isn't the ultimate goal of AI usage in programming. Instead, AI developers are creating tools that will help other programmers and creators build powerful solutions with less effort.

Can AI detect AI writing? ›

Three researchers from the MIT-IBM Watson AI lab and Harvard NLP group created a great free tool to help detect machine-generated text content named the Giant Language Model Test Room (or GLTR, for short). GLTR is currently the most visual way to predict if casual portions of text have been written with AI.

Will AI become self aware? ›

The CEO of Alphabet's DeepMind said there's a possibility that AI could become self-aware one day. This means that AI would have feelings and emotions that mimic those of humans. DeepMind is an AI research lab that was co-founded in 2010 by Demis Hassabis.

Will no-code replace programmers? ›

Let's be clear upfront: low-code will not replace high-code developers working in languages like Java, C++, or Python. Citizen developers won't replace senior developers with decades of experience or even junior developers with a year or so on the job.

Will no-code platforms replace developers? ›

These tools are getting better, but they won't replace developers any time soon. Tatum Hunter is a former Built In associate editor covering software engineering, design and UX, and software sales.

What is the common problem of chatbot? ›

Chatbots have limited responses, so they're not often able to answer multi-part questions or questions that require decisions. This often means your customers are left without a solution, and have to go through more steps to contact your support team.

What is the most unbreakable code? ›

Common Questions About Unbreakable Codes

There is only one provably unbreakable code called the Vernam cypher created during World War II to defeat the Germans. It uses genuinely random information to create an initial key.

What is the most confusing coding language? ›

Malbolge. Malbolge (named after the 8th circle of Hell) was designed to be the most difficult and esoteric programming language. Among other features, code is self-modifying by design and the effect of an instruction depends on its address in memory.

What was the hardest code ever cracked? ›

One of the hardest codes to crack is arguably the US government's Advanced Encryption Standard (aka Rijndael or AES) which the Americans use to protect top-secret information. AES is considered unbreakable by even the most sophisticated hackers.

Is 1 hour of coding a day enough? ›

It is true that the more time you put in, the faster you'll learn, but if you're okay with a longer timeframe, an hour a day is plenty. In fact, if you had the choice to spend ten hours learning to code over the weekend versus spending one hour each day of the week, I'd recommend the latter.

What is the average day in the life of a coder? ›

On a typical day, a computer programmer can be involved in many different coding projects. Daily duties might include: Writing and testing code for new programs. Computer programmers work closely with web and software developers to write code for new mobile applications or computer programs.

Do programmers code 8 hours a day? ›

Typically, computer programmers spend an average of 40 hours per week on their jobs, which narrows to eight hours per day, between Monday and Friday. Programmers usually work between 9 am to 5 pm or work schedules comparable to typical office culture.

Is reading code harder than writing code? ›

The first reason code is harder to read than to write has to do with the sheer amount of data you need to keep in your head in order to read code. When you write code, you only need to remember the variables, algorithms, data, etc. relevant to the feature your are currently writing.

What kind of brain is good at coding? ›

In each case, the same part of the brain lit up: the area responsible for logical reasoning. And though the act of logical reasoning has no brain hemisphere preference, coding strongly favored the left hemisphere, the area that correlates with language.

Why is coding so addictive? ›

One of the characters of a coder or developer job is the ever-changing nature of programming. Most developers starve and are thirsty for knowledge and for learning new things. That makes coding so addictive. You're growing your skills every time.

Who is father of AI? ›

John McCarthy is one of the "founding fathers" of artificial intelligence, together with Alan Turing, Marvin Minsky, Allen Newell, and Herbert A. Simon.

Is C++ the best language for AI? ›

AI Programming With C++

It executes code quickly, making it an excellent choice for machine learning and neural network applications. Many AI-focused applications are relatively complex, so using an efficient programming language like C++ can help create programs that run exceptionally well.

Which language is not mostly used for AI? ›

7) Which of the given language is not commonly used for AI? Explanation: Among the given languages, Perl is not commonly used for AI. LISP and PROLOG are the two languages that have been broadly used for AI innovation, and the most preferred language is Python for AI and Machine learning.

What was Jarvis coded in? ›

JARVIS is a Voice-Based AI Assistant which is developed in Python Programming Language. It uses Different Technologies To Add New Unique Features. It can Automate Tasks with just One Voice Command.

What AI is closest to Jarvis? ›

5 Best Jarvis (Jasper AI) Alternatives You Should Consider
  1. Copysmith: The Jarvis Alternative For Large eCommerce Teams. Copysmith is the best Jasper alternative for eCommerce and large marketing teams. ...
  2. Rytr. ...
  3. Writesonic. ...
  4. Anyword. ...
  5. Copy AI.

Who was Jarvis programmed by? ›

J.A.R.V.I.S. is an artificial intelligence created by Tony Stark, who later controls his Iron Man and Hulkbuster armor for him.

What is black box AI? ›

Black box AI is any artificial intelligence system whose inputs and operations aren't visible to the user or another interested party. A black box, in a general sense, is an impenetrable system. Black box AI models arrive at conclusions or decisions without providing any explanations as to how they were reached.

Can chatbot write Python code? ›

Chatbots can provide real-time customer support and are therefore a valuable asset in many industries. When you understand the basics of the ChatterBot library, you can build and train a self-learning chatbot with just a few lines of Python code.

What programming language is chatbot written in? ›

Python. Python is a preferred language for data projects, machine learning projects, and chatbot projects. It has a simple syntax that even beginner developers find easy to read and understand.

Can chatbot write my essay? ›

The short answer is yes, but with some limitations. We're going to look at how to write essays with ChatGPT and other AI tools. We're also going to examine the pros and cons of using ChatGPT and discuss why we think you still need the human touch for the best results.

References

Top Articles
Latest Posts
Article information

Author: Dean Jakubowski Ret

Last Updated: 11/10/2023

Views: 6441

Rating: 5 / 5 (70 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Dean Jakubowski Ret

Birthday: 1996-05-10

Address: Apt. 425 4346 Santiago Islands, Shariside, AK 38830-1874

Phone: +96313309894162

Job: Legacy Sales Designer

Hobby: Baseball, Wood carving, Candle making, Jigsaw puzzles, Lacemaking, Parkour, Drawing

Introduction: My name is Dean Jakubowski Ret, I am a enthusiastic, friendly, homely, handsome, zealous, brainy, elegant person who loves writing and wants to share my knowledge and understanding with you.