August 3rd 2023

ai and machine learning for coders :: Article Creator

The Secret Sauce Of AI Companies: The Importance Of Human-Annotated Data In High-Performing Machine Learning Models

Founder and CEO at Stealth Scaling.

getty

Artificial intelligence (AI) has made significant strides in recent years, largely due to one crucial ingredient: data. Among the myriad types of data available, human-annotated data stands apart for its unique value proposition in training Machine Learning Models. This post explores the pivotal role of human-annotated data, provides practical advice on best practices when employing it, and guides decision-making between in-house labeling and vendor partnerships.

Understanding The Role And Relevance Of Human-Annotated Data

The world of Machine Learning revolves around data. The "garbage in, garbage out" (GIGO) principle underscores the importance of the quality and structure of data: A superior input leads to a superior output. The best machine learning models are those that are trained on the richest, most meaningful data. Enter human-annotated data, which brings a layer of human understanding that machines alone cannot replicate. This creates a more sophisticated learning pathway for AI and is instrumental in developing superior machine learning models.

Why Consider Human-Annotated Data?

Human-annotated data yields notable advantages. It provides a nuanced understanding of data, enabling models to navigate the complexities of human language and behavior more effectively. It also facilitates error identification and correction, enhancing the overall data quality. Finally, human-annotated data allows a high degree of customization, ensuring AI models can be fine-tuned to their intended applications.

Practical Advice For Employing Human-Annotated Data

Maximizing the benefits of human-annotated data requires adhering to certain best practices:

• Select the right annotators: Quality annotations necessitate skilled annotators. Experience and domain-specific knowledge are vital traits to seek in your annotation team. Ensure adequate training and support are provided to uphold the highest standards of labeling.

• Ensure data diversity: Your data should represent the range of scenarios where your AI will operate. The broader the range, the better the model's performance. Avoid biases by encompassing diverse sources, geographies and demographics in your data collection process.

• Maintain quality standards: Regularly review the annotated data to keep up the quality. Implement a robust quality control process that includes cross-validation among multiple annotators and routine error checks.

• Provide clear annotation guidelines: Develop comprehensive and unambiguous guidelines to aid the annotators. Details regarding the classification categories, examples of correctly labeled data, and an explanation of common pitfalls can drastically improve annotation quality and consistency.

• Uphold ethical and privacy norms: Respect for ethical guidelines and data privacy laws is nonnegotiable. Ensure transparency and informed consent in data collection, and use anonymization techniques to protect privacy.

Choosing Between In-House Labeling And Vendor Partnerships

The choice between employing in-house labelers and opting for an external vendor is a key strategic decision.

• In-house labeling: This approach provides direct control over the data annotation process, which is critical when dealing with sensitive information or domain-specific knowledge. However, it may pose scalability challenges for larger projects.

• Vendor partnerships: Partnering with a vendor affords scalability, access to an experienced workforce and sophisticated annotation tools. Vendors who utilize machine learning models for preliminary annotations, complemented by human expertise for refinement, can create a powerful synergy that ensures high-quality annotations.

When choosing a vendor, evaluate their understanding of your AI use case, the robustness of their quality assurance process and their commitment to data privacy.

The Key Ensuring Quality: Human Review And Multiple Rounds Of Inspection

A critical component in the data annotation process is quality assurance. Despite the advances in machine learning and AI, the human touch in reviewing and verifying data remains paramount. The complex nuances, context understanding and creative interpretations that human intelligence brings to the table can't be entirely replicated by machines.

Even with the most skilled annotators, errors can occur. Misinterpretations, mislabeling or simply overlooking data points are common pitfalls in the annotation process. A system of human review as a quality check helps to identify and correct these errors.

Conducting multiple rounds of review further enhances the quality of your annotated data. Each review round can catch different types of errors. Initial rounds might focus on glaring mistakes or inconsistencies, while subsequent rounds can delve into finer details and more subtle inaccuracies. This iterative process of review and refinement aids in creating a high-quality, reliable data set.

Additionally, it can be beneficial to employ a multiple-annotator system, where more than one person annotates the same piece of data. Discrepancies in annotations can be identified and addressed, contributing to a more accurate and reliable data set.

To sum up, the human review and multi-round inspection process plays a crucial role in maintaining high-quality, human-annotated data. It acts as a safety net, catching and rectifying errors that might otherwise impede the performance of machine learning models. A rigorous, well-implemented review process can be a game changer in the world of AI, ensuring your data is of the highest quality and your AI models perform at their best.

As AI continues to evolve, human-annotated data is set to gain even more importance. By following these best practices and making a well-informed decision between in-house and vendor resources, AI companies can extract maximum value from human-annotated data, leading their machine learning models to achieve peak performance. Remember, the key lies not just in harnessing data but in expertly annotating it—the secret sauce of successful AI companies.

Human annotation bridges the gap between raw data and meaningful information. It brings human understanding and contextual interpretation to the table, elements that machines cannot fully replicate. This enriches the data, enabling machine learning models to navigate complexities and ambiguities that would otherwise be challenging. From error identification and correction to the creation of highly customized data sets, the advantages of human-annotated data are manifold.

In conclusion, human-annotated data stands at the forefront of high-quality data production. Its pivotal role in crafting superior machine learning models makes it the secret sauce for successful AI companies. The path to the best AI is laid with the bricks of the best data, and human annotation is the key to unlocking this potential. As we continue to push the boundaries of what AI can achieve, the value of human-annotated data will only continue to grow.

Forbes Communications Council is an invitation-only community for executives in successful public relations, media strategy, creative and advertising agencies. Do I qualify?

Oracle's AI-generating AI Chatbot

1 Warren Buffett Index Fund That Could Turn $500 per Month Into $1 Million

1 Stock-Split Stock to Buy Hand Over Fist in August and 1 to Avoid Like the Plague

1 Spectacular Growth Stock to Buy Before It Soars 1,475%, According to Wall Street

3 Reasons Not to Participate in Your Employer's 401(k)

Low Code AI With Power Apps And Power Automate

Low-code and no-code software development platforms were developed to enable so-called citizen makers (also known as power users and non-professional programmers) to create professional applications. But historically, such efforts often stalled without the participation of programmers and database administrators.

Microsoft has been banging away at this problem for decades, going back to Excel. After focusing on AI and machine learning capabilities in Microsoft Azure for the past couple of years, the company is now adding generative AI to the mix, thanks to a large investment in OpenAI that has made ChatGPT/GPT-4 available to Azure users.

AI Builder in Power Apps and Power Automate

Microsoft recently folded generative AI capabilities into the AI Builder section of Power Apps and Power Automate. As you can see in the figure below, AI Builder is at the top of Microsoft's AI stack, drawing on the capabilities present in Azure AI Services—the domain of professional developers—and making them available to citizen makers within Power Apps and Power Automate.

IDG

Microsoft's AI Stack has three layers. The Azure ML Platform is for building AI models; Azure AI Services are for professional software developers who need to use or customize the functionality of those AI models; and AI builder lets citizen makers consume the models with no programming experience required.

Generating applications with Copilot

Microsoft and GitHub's "copilot" branding sort of made sense when it only applied to acting as a pair programmer using programming editors. Now that it applies to Windows 11, Microsoft 365, and Power Platform, I'm less convinced. It just smells like marketing.

That said, there are two major use cases for AI Copilot in Power Platform: generating applications and using GPT for specific flows or focused tasks. We'll concentrate on application generation for now and look at specialized GPT flows later on.

IDG

The Power Apps home screen now offers a text-based "Let's build an app" option at the top of the page, which uses GPT. To be able to see this currently, you have to enable the preview, wait, and possibly create a new development environment and refresh the screen a few times. If you want to revert to the old home screen, use the toggle on the top right side of the screen.

The application generation process seems to be less than meets the eye. At the moment, it takes you to a "Here's a table for your app" screen. This tactic is consistent with the way Power Apps generates applications from tables.

IDG

The table screen shows a simple proposed table with a Copilot box at the right. Not all the suggestions shown on the lower left currently accomplish anything other than to regenerate the sample table. Merely clicking a suggestion currently does nothing. If you type "add more rows" in the Copilot text box, you will see a larger table, but the contents may change randomly—in my case, from school supplies to fruits.

The actual code generation happens after you click the Create app button on the lower right side of the screen.

IDG

Once you've generated the basic application, you can explore its screens and components and their properties, modify them as you wish, and add more data. You can navigate at both the left and bottom side of the screen, add and edit at the far left and top, and edit properties at the right.

IDG

If you navigate to the main screen, you will be able to change screen-wide properties such as the theme we're changing here. Pressing the triangle at the top right allows you to preview the application.

IDG

The default app preview uses a web layout. The dropdowns at the top right allow you to choose different form factors.

IDG

The phone dropdown is in alphabetical order by brand, so of course it starts with Apple iPhone models. Farther down are Motorola, Samsung, and Xiaomi.

IDG

The iPhone previews include a shell image as well as the screen contents. The previews are dynamic and functional. This is a detail screen I got to from the scrolling list of items. Note the edit and delete icons at the top right.

IDG

In horizontal mode on a phone, the application displays the scrolling list in the left-hand column and the detail screen in the right-hand column.

IDG

The horizontal tablet preview gives you enough space to see the whole detail form as well as the scrolling list.

Power Apps prebuilt AI models

Power Apps currently offers 17 AI models that you can use to create flows to embed in applications. We'll look at each of them below.

IDG

The Power Apps AI models include processing for various kinds of documents, for example, invoices, receipts, and identity documents. They also include text generation, sentiment analysis, translation, and other text processing functions, as well as time series predictions.

Azure OpenAI Service / Text Generation / GPT (preview)

The "Create text, summarize documents, and more with GPT" preview service is the newest jewel of Power Apps. While a few of these capabilities duplicate other services, for example sentiment analysis, most of them are valuable additions to the Power Apps' arsenal.

IDG

The sample templates are guides to prompts known to work with GPT. You're not restricted to these capabilities, however: It's not that hard to write a prompt from scratch.

IDG

Creating a model with the GPT service is essentially an exercise in prompt engineering, and the interface allows you to test your prompts on a variety of inputs. Here, I've used several paragraphs from my 2021 article on Azure AI and asked GPT to summarize the text. It did a fairly good job. Notice the instruction "without adding new information," which is intended to keep GPT from bringing in material it has seen elsewhere, or worse, hallucinated.

Business card reader

The business card reader is one of the many AI Builder services that draw on Azure AI Services. More of these follow.

IDG

As you can see from the image, the business card reader performs OCR on business cards and both extracts and labels all the common fields.

Category classification (preview) IDG

The category classification service reads text in any of seven languages and applies a prebuilt model to classify the customer feedback into predefined categories. The current categories are Issues, Compliment, Customer Service, Documentation, Price & Billing, and Staff.

Entity extraction IDG

Entity extraction can use a prebuilt or custom model to extract entities from free text in any of seven languages. There are 25 supported entity types in the prebuilt model.

ID reader IDG

The identity document reader prebuilt model extracts information from passports, US driver's licenses, US social security cards, and US green cards.

Invoice processing IDG

Processing invoices requires handling tables of line items as well as global values.

Key phrase extraction IDG

Key phrase extraction is a way to extract the main talking points from a free text document. Unlike an entity extraction model, key phrase extraction identifies whatever's in the text rather than looking for specific words and phrases.

Language detection IDG

Language detection is often the first stage of a text processing flow. Once you know the language of a document, you can go on to analyze its sentiment, extract key phrases, and translate it to another language.

Receipt processing IDG

Like invoice processing, receipt processing has to handle lists of items as well as global values. Dealing with crumpled invoices is a common use case.

Sentiment analysis IDG

Identifying the sentiment of text can be a useful way to control how the message should be processed further. Positive sentiment might flow into a queue of endorsements for use by marketing, while negative sentiment might trigger a response from customer service.

Text recognition IDG

Text recognition is a generalized OCR process that tries to extract all the text from an image.

Text translation IDG

The text translation model includes source language identification, so you don't have to invoke that separately, although if you know the source language you can specify it to skip the detection step. This text translation is rated as "real-time" and is limited to 10,000 characters at a time.

AI Builder in SharePoint and Teams

The Microsoft Syntex service allows you to create AI Builder models in SharePoint. Syntex is a Microsoft 365 service that has even more text processing models than Power Apps.

To use AI Builder in Teams, install the Power Automate app in Teams. Then you can create flows to use from the AI Builder templates.

Conclusion

Microsoft now has an extensive set of low-code AI and ML capabilities built into the AI Builder section of Power Apps and Power Automate, currently on a preview basis. Microsoft's competitors in this space aren't sitting back and ignoring AI, however, and this snapshot won't be the final word in the field.

Overall, Power Apps is shaping up to be a rather nice low-code development environment with the addition of the new AI and ML capabilities, although it's certainly not there yet. While the combination of Power Automate flows, AI, and Power Apps seems a bit random at first glance, it could turn out to be a powerful combination.

The Ultimate Guide to Cloud Gaming: D…

This post first appeared on Autonomous AI, please read the originial post: here

People also like

The Ultimate Guide to Cloud Gaming: Discover the Best Services

Machine-learning experts can earn up to $160,000 with these skills

The Secret Sauce Of AI Companies: The Importance Of Human-Annotated Data In High-Performing Machine Learning Models

Oracle's AI-generating AI Chatbot

Low Code AI With Power Apps And Power Automate

Related Articles

Machine-learning experts can earn up to $160,000 with these skills

The Secret Sauce Of AI Companies: The Importance Of Human-Annotated Data In High-Performing Machine Learning Models

Oracle's AI-generating AI Chatbot

Low Code AI With Power Apps And Power Automate

Related Articles

Share the post

Subscribe to Autonomous Ai

Thank you for your subscription