December 7th 2023

The challenges enterprises must overcome to train their AI

By Philip Miller, Customer Success Manager at Progress

The time is now for Enterprises that haven’t already done so to explore the potential of AI for their organisations. A report by Accenture reveals 73% of businesses claim that AI is their top digital investment priority. But with the spiralling amounts of data organisations consume and store, enterprises must proceed with caution in how they implement it across their organisation – it’s fair to say that bigger isn’t always better. It can cause one of the lesser-known dangers of AI – data bias.

best projectors for home
A List of the Best College Graduation…
Embark on a Gaming Odyssey with Ninte…
Customer Stunned by Electric Vehicle …
Anker Soundcore Sleep A20 Earbuds: En…

It’s now more critical than ever before that enterprises understand their people, their data and their organisation to navigate the potential pitfalls that arise when dealing with this potentially world changing technology. In fact, the old adage “Garbage in, garbage out?” couldn’t be truer in an AI context. Enterprises should therefore understand the importance of how Large Language Models (LLMs), Medium Language Models (MLMs) and volumes of data can govern data accuracy, trustworthiness and transparency. Challenges to executing AI successfully can include noisy data, which can impact company’s performance and it’s forecasting, decision making, resources and customer experiences.

The implications of noisy data in generative AI

Noisy data no longer means ‘corrupt data’ – most recently, it has become synonymous with any type of data that machines cannot read or interpret correctly, such as “unstructured data”. It’s relevant for those leading and influencing AI strategies to understand where noisy data resides and how it is linked to the sheer volume of data AI needs for it to be trained. It’s important to be aware of where one might find noisy data and build an understanding of its increasing volume so that we as humans can use technology in the right way to make sense of it.

The complexities of using larger and larger data sets for training AI

Historically, our human brains have never had to deal with large, 1,000 or even 10,000+ sets of things. In contrast, AI technologies can work with numbers far beyond our comprehension. A recent iteration of ChatGPT uses 12,288 dimensions, with each dimension being an aspect of the word (softness, frequency, register etc.), which is then given a value for each property ChatGPT is ascribing to that word. It’s almost beyond our comprehension.

The reason this is important is that when you use larger and larger data sets for training AI, strange things happen. Large numbers play tricks with our minds and there is so much that isn’t very meaningful or accessible to us. So, as we increase the amount of data to our AI-based tool, for instance ChatGPT, the Noise in the data can interfere with what we’re looking for as an output (the signal). And when we add data that is inaccessible, for instance unstructured data, this noise only increases.

Key considerations to effectively train your AI

There are some key considerations for enterprises to factor in when training their AIs, to reduce the noise and make it as performant as possible. This means doing everything to ensure the AIs we create along the way give us the most correct answers possible.

Data must be curated:Enterprises should focus deeply on the data they use to train their AIs. They should be cleaning, curating, harmonising and modelling their proprietary data before the AI even looks at it to ensure that this noise is reduced and that the data required is significantly less. This will not only remove most of the noise from the output, but also reduce the cost when training the AI, moving closer to a MLM (Medium Language Model).
Choose a semantic data platform:The data platform organisations are using to achieve AI must be able to handle metadata, such as a semantic platform, so that it can extract facts from entities in the data, combine this data with metadata, its location in a taxonomy, the ontology around the data, links and relationships to other data and harmonise the data into the correct canonical model for the AI.
Embed security:Since this data platform may be handling sensitive or third-party data, it’s important to have security built in. Enterprises need an auditable trail so that changes made to the data can be traced back to the source, should any issues arise when presenting these changes to the AI.
A scalable data platform:A scalable, multi-model data platform is essential – whether it be an LLM, MLM or some other AI model. Optimising AI often requires multiple technologies, but stitching together different systems can result in a fragile architecture that is difficult to maintain and manage. A fully scalable data platform will evolve and change when new data and/or systems are added to the platform.

Optimising AI requires technology and skills

It’s a tough challenge to fully grasp AI, LLMs and MLMs and how data in any volume or shape can influence the output. But having the right data technology in place is fundamentally important to reduce the noise and make AI as performant as possible.

Forward thinking organisations are already investing in enhancing the accuracy, transparency, trustworthiness and security of these AI systems and are embedding them in their businesses. But before getting out of the starting blocks with AI, organisations must have the required skills to train their AIs to provide the most correct answers possible. This will achieve increased security, improved trustworthiness, cost-savings and intuitive prompt creation and response understanding that will deliver the right responses from AI and truly enhance business operations and efficiency.

This post first appeared on Technologydispatch, please read the originial post: here