UPDATED 12:31 EDT / FEBRUARY 28 2024

AI

StarCoder2 AI code generator released with support for 619 programming languages

ServiceNow Inc., Hugging Face Inc. and Nvidia Corp. today released StarCoder2, the latest version of the trio’s StarCoder family of open-source large language models for code generation.

The companies said StarCoder2 is faster and more flexible than its predecessor and includes features that protect against intellectual property infringement.

Trained in 619 programming languages, StarCoder2 was developed in partnership with the BigCode Community, a research project managed by ServiceNow and Hugging Face that was launched last year with the debut of the original StarCoder. The model’s foundation is a new code dataset called Stack v2, which is more than seven times larger than Stack v1. The new dataset also includes training techniques that help the model understand low-resource programming languages such as Cobol, mathematics and program source code discussions.

StarCoder2 can be fine-tuned and embedded in enterprise applications to perform tasks such as source code generation, workflow generation and text summarization, the companies said. Developers can use its code completion, code summarization, code snippets retrieval and other capabilities to write code faster.

Choose your size

The model comes in three sizes: a 3 billion-parameter model trained by ServiceNow, a 7 billion-parameter model trained by Hugging Face and a 15-billion-parameter model built by Nvidia with its NeMo generative AI framework and trained on Nvidia infrastructure. The smaller variants save on computing costs since fewer parameters require less processing during the inferencing stage when models make deductions based on their training data. They can also run on a consumer-grade graphics processing unit.

The companies said StarCoder2’s 3 billion-parameter model matches the performance of the original StarCoder’s 15-billion-parameter model and can make more accurate predictions because it is trained on a larger corpus of languages. They said its broader and deeper training enables the model to provide better context-aware predictions.

Software development has been a prime usage area for AI, spurred in part by early successes like GitHub Inc.’s Copilot and Amazon Web Services Inc.’s CodeWhisperer. A recent GitHub survey found that 91% of U.S. developers use AI coding tools. However, a survey by CoderPad Inc. also reported that nearly one-quarter of developers are skeptical about AI’s value, and 28% said their employer prohibits its use.

Transparency play

Among the major reasons for hesitation are fears that generators produce inefficient code, introduce security vulnerabilities and can steal intellectual property by generating code based on copyrighted material in its training model. A recent Stanford University research study found that AI assistants have been found to create insecure code in lab environments.

The three sponsoring companies are addressing these concerns by stressing transparency. StarCoder2 was built using responsibly sourced data under license from Software Heritage, which hosts what it says is the largest public collection of source code. The model’s supporting code will reside on the BigCode project’s GitHub page. It’s being made available under the BigCode OpenRAIL-M license, allowing royalty-free access and use.

While the license has many open-source characteristics, it isn’t technically a full open-source license. RAIL-M imposes restrictions prohibiting licensed software from providing medical advice or administering justice, for example. It has also been criticized for being too vague.

All StarCoder2 models will also be available for download from Hugging Face, and the StarCoder2 15-billion-parameter model is available on Nvidia AI Foundation models.

Photo: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU