Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Why Data Pipeline is Important for High ROI AI Products

Salesforce is said to be in the final stages of negotiations to acquire data-management software provider Informatica for $11 billion. To many, the deal is reminiscent of Snowflake’s acquisition of Neeva in May last year. 

“Salesforce potentially acquiring Informatica seems like a push to compete more with Snowflake,” wrote Astasia Myers, partner at Felicis. 

These acquisitions are in step with big-tech companies like Google acquiring Looker, the data analytics startup, and Microsoft acquiring ADRM Software and Rubrik in 2019 and 2021, respectively. Rubrik, a data management company, is now targeting an IPO with a $5.4 billion valuation. 

Databricks’ acquisition of Acrion, Mosaic ML, and Okera is also along similar lines, aimed at managing its data Pipeline and increasing generative AI capabilities.

Similarly, Salesforce’s possible acquisition of Informatica is targeted at greatly enhancing its data capabilities, especially in fields like data integration, quality assurance, and customer insights. This also points towards the importance of building a data Strategy and ensuring a smooth pipeline when it comes to building high-ROI AI products.

Contextualised Data is King

James Wu, partner at M12, highlighted in a recent post that building a strong data pipeline and building data-centric AI is important. That is why the venture fund also invested in Unstructured.io, with another data curation company in the pipeline. “Big data will continue to be the foundation, but contextualised data is king,” he said. 

“We’re interested in the ‘AI-data feedback loop’ – we think better AI can analyse data to identify errors and inconsistencies, improving data quality for future models,” he explained, saying that cleaner data can also help in training superior AI models, like a cyclic loop. 

Naveen Rao, VP of generative AI at Databricks, also shared similar thoughts. “We at Databricks are very much about the lifecycle of data and GenAI working synergistically together. We demonstrated the power of our training platform by building DBRX with it and we used all the tools in Databricks. We believe in the power of all the components around the model that comprise the full system,” he said.

This points to the need for building a good data strategy for expecting high ROI on AI products. Matthew Blasa, AI strategist and lead data scientist consultant, emphasises that since AI’s lifeblood is data, it is important to have an endless clean data pipeline for AI products. 

Source: Matthew Blasa

“It’s important to ensure that your data is reliable, relevant to the large needs, and collected from multiple sources,” Blasa explained. “Without a clear data strategy, creating a model with enduring value is challenging. Relying solely on retraining and monitoring won’t close the gap. It may even make it harder.”

Crawling, walking, and running with AI

AI advisor Vin Vashishta shares the perfect plan for companies building AI products. “One thing that I’ve learned after a decade of building data and AI products is that businesses must crawl-walk-run with AI,” he wrote in a post. 

Crawling is about collecting data, walking involves using the data to create descriptive models, and running uses more advanced models such as predictive, prescriptive, and diagnostic ones. He explains how starting with crawling and walking makes running less expensive and faster in the long run. 

Each phase offers immediate benefits and builds on the previous phase, creating a solid foundation. “Walk and run handle about 90% of use cases, reducing time to value,” Vashishta explained.

In another post, Vashishta explained how high-quality data can bring quick results, and descriptive models trained on it yield quarterly gains. These efforts lay the foundation for AI products and potentially larger returns. “Trash data trains trash models, but the business needs tangible returns in months, not years. Fixing the data doesn’t deliver them unless data teams and leaders take a product-first approach,” he added.

The Data Pipeline Strategy

It is clear that data availability is important to build the best generative AI products. This is why companies like Salesforce, Snowflake, Databricks, and all other data and AI providers are expanding their hold on data companies. This would, in the end, provide them with high-quality streamlined data to improve their AI products. 

AI products are data products. “Without a solid data strategy, it’s tough to trust the decisions made by our AI-driven products and keep them profitable,” said Blasa.

The post Why Data Pipeline is Important for High ROI AI Products appeared first on Analytics India Magazine.



This post first appeared on Analytics India Magazine, please read the originial post: here

Share the post

Why Data Pipeline is Important for High ROI AI Products

×

Subscribe to Analytics India Magazine

Get updates delivered right to your inbox!

Thank you for your subscription

×