Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

The Power of Code-Specific Instruction Data in Improving Code Language Models

Large Language Models (LLMs), such as OpenAI’s ChatGPT, have gained significant attention and success in various tasks. This extends to Code-related activities as well, where Code LLMs have shown remarkable performance. These models excel by leveraging pre-training on vast amounts of internet data and fine-tuning with specific instruction data.

However, there is room for improvement in tailoring fine-grained instructions in the code domain, as most prior Code LLMs primarily focus on the pre-training phase. To address this, researchers from Microsoft and Hong Kong Baptist University have taken inspiration from the Evol-Instruct approach and embarked on a project to enhance the capabilities of the StarCoder open-source Code LLM.

Their approach involves generating detailed code instruction data using a code-specific Evol-Instruct technique. They have modified the evolutionary prompt process, simplified prompts, improved instructions, and incorporated code debugging and time-space complexity limitations. Using this newly developed instruction data, they fine-tune StarCoder and create WizardCoder.

Experimental results on four code-generating benchmarks, including HumanEval and DS-100, demonstrate that WizardCoder surpasses all other open-source Code LMMs in terms of code creation. Surprisingly, it even outperforms closed-source LLMs like Claude and Bard on HumanEval and HumanEval+ despite its smaller size.

In summary, this project contributes by providing WizardCoder, an enhanced Code LLM that outperforms other models in code creation tasks. It showcases the importance of code-specific instruction data in improving the performance of Code Language Models.

Sources:
– Paper: [not provided]
– GitHub: [not provided]
– Aneesh Tickoo, MarktechPost

The post The Power of Code-Specific Instruction Data in Improving Code Language Models appeared first on TS2 SPACE.



This post first appeared on TS2 Space, please read the originial post: here

Share the post

The Power of Code-Specific Instruction Data in Improving Code Language Models

×

Subscribe to Ts2 Space

Get updates delivered right to your inbox!

Thank you for your subscription

×