UPDATED 12:31 EDT / FEBRUARY 28 2024

AI

StarCoder2 AI code generator released with support for 619 programming languages

ServiceNow Inc., Hugging Face Inc. and Nvidia Corp. today released StarCoder2, the latest version of the trio’s StarCoder family of open-source large language models for code generation.

The companies said StarCoder2 is faster and more flexible than its predecessor and includes features that protect against intellectual property infringement.

Trained in 619 programming languages, StarCoder2 was developed in partnership with the BigCode Community, a research project managed by ServiceNow and Hugging Face that was launched last year with the debut of the original StarCoder. The model’s foundation is a new code dataset called Stack v2, which is more than seven times larger than Stack v1. The new dataset also includes training techniques that help the model understand low-resource programming languages such as Cobol, mathematics and program source code discussions.

StarCoder2 can be fine-tuned and embedded in enterprise applications to perform tasks such as source code generation, workflow generation and text summarization, the companies said. Developers can use its code completion, code summarization, code snippets retrieval and other capabilities to write code faster.

Choose your size

The model comes in three sizes: a 3 billion-parameter model trained by ServiceNow, a 7 billion-parameter model trained by Hugging Face and a 15-billion-parameter model built by Nvidia with its NeMo generative AI framework and trained on Nvidia infrastructure. The smaller variants save on computing costs since fewer parameters require less processing during the inferencing stage when models make deductions based on their training data. They can also run on a consumer-grade graphics processing unit.

The companies said StarCoder2’s 3 billion-parameter model matches the performance of the original StarCoder’s 15-billion-parameter model and can make more accurate predictions because it is trained on a larger corpus of languages. They said its broader and deeper training enables the model to provide better context-aware predictions.

Software development has been a prime usage area for AI, spurred in part by early successes like GitHub Inc.’s Copilot and Amazon Web Services Inc.’s CodeWhisperer. A recent GitHub survey found that 91% of U.S. developers use AI coding tools. However, a survey by CoderPad Inc. also reported that nearly one-quarter of developers are skeptical about AI’s value, and 28% said their employer prohibits its use.

Transparency play

Among the major reasons for hesitation are fears that generators produce inefficient code, introduce security vulnerabilities and can steal intellectual property by generating code based on copyrighted material in its training model. A recent Stanford University research study found that AI assistants have been found to create insecure code in lab environments.

The three sponsoring companies are addressing these concerns by stressing transparency. StarCoder2 was built using responsibly sourced data under license from Software Heritage, which hosts what it says is the largest public collection of source code. The model’s supporting code will reside on the BigCode project’s GitHub page. It’s being made available under the BigCode OpenRAIL-M license, allowing royalty-free access and use.

While the license has many open-source characteristics, it isn’t technically a full open-source license. RAIL-M imposes restrictions prohibiting licensed software from providing medical advice or administering justice, for example. It has also been criticized for being too vague.

All StarCoder2 models will also be available for download from Hugging Face, and the StarCoder2 15-billion-parameter model is available on Nvidia AI Foundation models.

Photo: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.