Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Understand what you run before publishing your (silly) benchmark results

Posted on Jul 19 Before wasting time and money on large cloud instances, a benchmark to compare Database performance should start with understanding what you run:Different databases, even if all compatible from a SQL point of view, may require different settings and indexes. The default settings are rarely good for intensive activity because they may be defined for small environments, to allow an easy quick start, and they may stay compatible with older versions without leveraging new features.If you read a benchmark report that uses all default settings, you can safely ignore their conclusions. There's a good chance that they tuned only the database they were paid to advocate.To show that it is not difficut to understand what you run, when in a Postgresql-compatible database, I'll look at the HammerDB benchmark connected to YugabyteDB. HammerDB has no specific code for it but YugabyteDB is PostgreSQL-compatible (it uses PostgreSQL code on top of distributed storage and transaction).Here is my docker-compose.yaml to start YugabyteDB and HammerDB:Here is the yb-tserver.flags referenced to setup the YugabyteDB configuration:I start the containers and define aliases to run the HammerDB client and the YugabyteDB shell (equivalent to PostgreSQL psql):I run the HammerDB Command Line Interface with the alias above (hammerdbcli) and configure my environement.I set PostgreSQL connection to YugabyteDBI build the schema:...I can connect to the database with my ysqlsh alias and look at the tables and functions:Don't waste your time loading huge amounts of data. What matters is data distribution and execution plans. That's enough to understand the scalability. When you know what you are running, and that you run it with a relevant configuration, you can scale with more data.Back to hammerdbcli I'll run only one iteration to see the SQL statements:Here is the HammerDB log:Here is the PostgreSQL log in YugabyteDB:I have logged all statements (setting log_statement='all' in --ysql_pg_conf_csv) so that I can see what is run by the application. I'm convinced that running a benchmark, and publishing the results, is useless without understanding what is run, and PostgreSQL, or PostgreSQL compatible databases, makes it easy to trace the SQL calls.Here is how I get all SQL statements that were run by this small execution:I see two statements that I can easily reproduce in ysqlsh:Here is the result when I run it with ysqlsh on my lab:At this point, it is clear that it makes no sense to run that with a benchmark with this. 100 milliseconds for OLTP is probably not what we expect. But that's not because of the database, it is because we don't know what we run.I want to look at the queries that are run within those stored functions. For this, I log all statements and their execution plan, with the result in my console, to avoid going to the log file, and with all nested statements to get the SQL within the stored procedures:This is YugabyteDB but it is the same as PostgreSQL. You don't have to learn something new and that's the advantage of PostgreSQL compatibility.I run again the two calls and all queries are logged:The query above are far from optimal. And they are probably very different from the applications you run today. Even with the HammerDB parameter pg_storedprocs=false it runs all the code in stored PL/pgSQL functions. There are also some inefficient constructs in the SQL statements. You can trace them or look at the source code, like scanning two times the same table here. There are efficient solutions for this in PostgreSQL and YugabyteDB: See how to select the first row of each set of rows in PostgreSQL through a variety of queries, clauses, and functions. In addition to that, even you see prepared statements, this is only for the call to the functions for which the planning time is not a problem. The queries within those calls are still parsed for each call. After running those, only the top-level call is prepared:The compex SQL queries will have to be parsed and executed for each execution. This may be ok because the parse time is fast (YugabyteDB caches the catalog tables in the local node) but, again, you need to know what you are measuring. Calling a stored function, with all business logic, though a prepared statement is probably not what you are doing with your application. Do you go to cloud-native databases to lift-and-shift applications designed in the 90's? That may not be a good idea.HammerDB queries were ported to PostgreSQL from an old benchmark made for Oracle Database (Oracle forbids the publication of such benchmarks today), which was probably coded more than 20 years ago. TPC-C is the simulation of an old monolithic client/server application, which has nothing in common with modern applications. Then it was ported to PostgreSQL, with some modifications made for some PostgreSQL-compatible vendors (HammerDB has some settings like pg_cituscompat for CitusDB and pg_oracompat for EDB).The YugabyteDB default settings suppose modern applications. Running this as-is with all default configuration makes no sense.Those kind of applications inherited from the past still exits. There are two solutions: review the code, and make something readable and optimizable, or use the most advanced database optimizer features.There are two features that can help in YugabyteDB:Let's try it with this HammerDB benchmark, without changing the code. I enable those two features and run the same again:Now, the response time looks good even without rewriting the old code to use modern SQL. You can test all the HammerDB TPC-C use cases with this. Look at the execution plan. Adapt the indexes. And when you have good response times, then you can start to run multiple clients and look at the throughput. I may write more about this HammerDB benchmark if I feel the need to run it on YugabyteDB. However, I recommend to run something that is relevant to your usage of the database. I was motivated to look at this because I've seen a benchmark that compares a sharded database, Citus, with distributed SQL, CockroachDB and YugabyteDB. It is the best example of publishing random numbers, for marketing purposes, without understanding what is run. They compared the throughput of Citus on Azure (branded as Cosmos DB for PostgreSQL) with High Availability Off, to Distributed SQL databases, CockroachDB and YugabyteDB, with Replication Factor 3, resilient to zone failure. And they use the same instance sizes. Obviously, replicas use more disk, RAM and CPU and you need more instances for the same throughput. For Replication Factor 3, you need more ressources, but you don't stop the application for upgrades of failure.Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse Joshua CG - Dec 28 '22 Nick Smet - Jan 30 abhishekjaindba - Dec 24 '22 Tony - Jan 25 Join the Yugabyte Cloud today, and start using a distributed, PostgreSQL-compatible, DBaaS. Once suspended, yugabyte will not be able to comment or publish posts until their suspension is removed. Once unsuspended, yugabyte will be able to comment and publish posts again. Once unpublished, all posts by yugabyte will become hidden and only accessible to themselves. If yugabyte is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to Franck Pachot. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag yugabyte: yugabyte consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging yugabyte will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

Understand what you run before publishing your (silly) benchmark results

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×