Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

External database integration

Posted on Sep 7 • Originally published at neuml.hashnode.dev txtai provides many default settings to help a developer quickly get started. For example, metadata is stored in SQLite, dense vectors in Faiss, sparse vectors in a terms index and graph data with NetworkX.Each of these components is customizable and can be swapped with alternate implementations. This has been covered in several previous articles.This article will introduce how to store metadata in client-server RDBMS systems. In addition to SQLite and DuckDB, any SQLAlchemy-supported Database with JSON support can now be used.Install txtai and all dependencies.Next, we'll install Postgres and start a Postgres instance.Now we're ready to load a dataset. We'll use the ag_news dataset. This dataset consists of 120,000 news headlines.Let's load this dataset into an embeddings database. We'll configure this instance to store metadata in Postgres. Note that the content parameter below is a SQLAlchemy connection string.This embeddings database will use the default vector settings and build that index locally.Let's run a search query and see what comes back.As expected, we get the standard id, text, score fields with the top matches for the query. The difference though is that all the database metadata normally stored in a local SQLite file is now stored in a Postgres server.This opens up several possibilities such as row-level security. If a row isn't returned by the database, it won't be shown here. Alternatively, this search could optionally return only the ids and scores, which lets the user know a record exists they don't have access to.As with other supported databases, underlying database functions can be called from txtai SQL.Note the addition of the Postgres md5 function to the query.Let's save and show the files in the embeddings database.Only the configuration and the local vectors index are stored in this case.As mentioned previously, all of the main components of txtai can be replaced with custom components. For example, there are external integrations for storing dense vectors in Weaviate and Qdrant to name a few.Next, we'll build an example that stores metadata in Postgres and builds a sparse index with Elasticsearch.First, we need to define a custom scoring component for Elasticsearch. While could have used an existing integration, it's important to show that creating a new component isn't a large LOE (~70 lines of code). See below.As with Postgres, we'll install and start an Elasticsearch instance.Let's build the index. The only difference from the previous example is setting the custom scoring component.Below is the same search as shown before.And once again we get the top matches. This time though the index is in Elasticsearch. Why are results and scores different? This is because this is a keyword index and it's using Elasticsearch's raw BM25 scores.One enhancement to this component would be adding score normalization as seen in the standard scoring components.For good measure, let's also show that the md5 function can be called here too.Same results with the additional md5 column, as expected.The last thing we'll do is see where and how this data is stored in Postgres and Elasticsearch.Let's connect to the local Postgres instance and sample content from the sections table.As expected, we can see content stored directly in Postgres!Now let's check Elasticsearch.Same query results as what was run through the embeddings database.Let's save the embeddings database and review what's stored.And all we have is the configuration. No database, embeddings or scoring files. That data is in Postgres and Elasticsearch!This article showed how external databases and other external integrations can be used with embeddings databases. This architecture ensures that as new ways to index and store data become available, txtai can easily adapt.This article also showed how creating a custom component is a low level of effort and can easily be done for a component without an existing integration.Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse Benson Ruan - Jul 26 Bry - Aug 27 Chris White - Jul 28 Giancarlo Rocha - Aug 25 Once suspended, neuml will not be able to comment or publish posts until their suspension is removed. Once unsuspended, neuml will be able to comment and publish posts again. Once unpublished, all posts by neuml will become hidden and only accessible to themselves. If neuml is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to David Mezzetti. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag neuml: neuml consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging neuml will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

External database integration

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×