Large Language Models for Code Generation

Introduction

In part 1, we explored what code-generating LLMs are and how they work. We saw some examples of how developers can use code-generating LLM technology to be more productive and efficient.

As a developer myself, I can’t be more excited about tools like StarCoder and GitHub Copilot for code completion. What else can it help me with?

Let’s take a look at some advanced features of code-generating LLMs.

What’s next for code-generation LLMs

If we look at Copilot Labs (a VSCode extension), it shows us a hint of what’s coming next:

Explain code: you can highlight a section of your code and Copilot will explain what the code is doing. This can be quite helpful for developers who are trying to learn a new codebase.
Language Translation: ever wanted to translate your Java code into Go? Well, now you can do just that with Copilot labs.
Code improvements: add documentation, make the code more readable, add types and more.

Can code-generating LLMs help us with code refactoring? Yes, they sure can.
I took the code example from PyCharm’s refactoring tutorial of simplifying rational numbers.

Me: can you please refactor the following python code:

GPT-4 not only does a great job, it also explains what it did:

Here’s the refactored code:

In this refactored code, I have:

Imported gcd function from the math module to simplify the calculation of the greatest common divisor (GCD) between the numerator and denominator.
Removed the loop for calculating the GCD and replaced it with the gcd() function.
Simplified the handling of the sign for the numerator and denominator by using a sign variable.
Updated the formatting in the str method to match the rest of the code.

This is really awesome. Great refactoring and also teaches you a thing or two about the latest best practices of code refactoring.

GPT-4 can also help with refactoring for speed. Here’s an example:

Me: “Please refactor this function to run faster:

GPT-4:

The given function uses a naive recursive approach which has exponential time complexity. We can improve the performance by using memoization to store intermediate results and avoid redundant calculations. Here’s a refactored version using memoization:

This version of the function will run significantly faster, as it only computes the Fibonacci numbers for each value of n once and reuses the stored results.

Code Generation and privacy

Can I point code generating LLMs to my internal code and have them provide answers specific to me? Yes, of course, that’s what GitHub Copilot does – it provides responses based on the context of your code inside VSCode.

But that means my code gets sent to the model and this has potential privacy implications. For example – would machine learning engineers developing Bard at Google trust that their code doesn’t leak to OpenAI if they use this? Probably not.

Will they eventually replace us?

We’ve seen how incredibly useful code generating LLMs can be, but this leads to the obvious question: will they get so good that they will eventually replace developers altogether?

I honestly don’t think so. At least not anytime soon.

Software development is not only about working on your laptop and being a “code monkey”. It’s a lot more than that. To create quality software, you need to understand the requirements, design the right architecture, choose deployment platforms, and make various design decisions along the way.

I see code generating LLMs as an amazingly powerful productivity tool that helps developers focus on what really matters, instead of memorizing “how do I do X in javascript” or “what is the method name for doing Y in Pandas”. Developers will still be responsible for design and architecture, will need to review and test the code generated by LLMs, do broader re-factorings, and much more.

Summary

Not long ago during a Vectara technical interview, the interviewer realized that the candidate had GitHub copilot enabled in their environment, but nevertheless allowed that candidate to continue using it.

That is a true story – the point is: there is much more to excellence in software engineering than what GitHub Copilot provides.

At Vectara we don’t focus on code-generating LLMs as part of our platform, but our engineering team fully enjoys the benefits it provides to improve our productivity.

You can, however, use Vectara’s powerful neural search and LLM-powered application platform in a few interesting ways:

Neural search for your public or private API documentation. We already support Docusaurus in Vectara-ingest, and it’s easy to add your own documentation crawler as needed.
Add question answering capabilities across all your other dev related data stores like Jira, Asana, Notion, Trello, Basecamp or Google drive.

Get started with our free plan here.