ServiceNow and Hugging Face release open-source AI model for generating code
ServiceNow Inc. and Hugging Face Inc. today introduced StarCoder, an open-source artificial intelligence model model that can generate code in multiple programming languages.
The companies claim that StarCoder is the most advanced model of its kind in the open-source ecosystem. It was developed through a research project that ServiceNow and Hugging Face launched last year. The project, which is called BigCode, drew contributions from not only the two companies’ engineers but also hundreds of other AI experts.
“The joint efforts led by Hugging Face and ServiceNow enable the release of powerful base models that empower the community to build a wide range of applications more efficiently than a single company could come up with,” said BigCode co-lead Leandro von Werra. “This endeavor is a testament to the potential of open‑source as we work toward democratizing AI.”
StarCoder is available in multiple versions. The core edition, StarCoderBase, features 15.5 billion parameters. Those are the settings that determine how an AI model goes about performing tasks such as generating code.
StarCoderBase was trained on a dataset called The Stack that includes code written in 358 programming languages. ServiceNow and Hugging Face didn’t use the entire dataset, but only code samples written in 86 of the supported programming languages.
During training, the companies also supplied StarCoderBase with software documentation and related technical information. In total, the AI model was trained on about one trillion tokens. A token is a unit of data that comprises a word, a word fragment or a few digits.
ServiceNow and Hugging Face trained StarCoderBase using a cluster of 64 servers equipped with A100 graphics cards. The A100 was Nvidia Corp.’s flagship data center AI accelerator until the chipmaker introduced its newest H100 chip last year. According to the companies, the server cluster they used to train StarCoderBase included 512 graphics cards.
The companies claim that the AI model can not only generate code in multiple languages, but also do so more efficiently than many rival models. During an internal test, the companies compared StarCoderBase with multiple open-source alternatives. They determined that the AI outperforms all other open-source code generation models with built-in support for multiple programming languages.
ServiceNow and Hugging Face claim that the AI can also outperform an early version of OpenAI LLC’s Codex model. Codex powers GitHub Copilot, the AI coding assistant offered by Microsoft Corp.’s GitHub unit.
StarCoderBase is available in multiple versions. Alongside the core version, there is an edition of the model that has been trained on additional Python, Java, JavaScript code samples, which should translate into improved support for the three languages. There’s also an edition optimized specifically to generate Python code.
Image: Unsplash
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU