December 18th 2023

Everyone loves to talk about code, but it’s really databases that run the world. And not nerd-chic graph databases, or vector databases, or object stores, or whatever hyped-up new hoohah that Meta created this month. I’m talking about old school, relational, SQL databases. Every (every) (every!) company uses them. And for good reason — tables and columns and relationships do an excellent job of modeling things and processes in the real world.

To be fair, there are solid specialized use cases for the hoohas too. But SQL is and will remain ubiquitous because it does its job. I’m actually of the opinion that everyone should learn SQL, not just because it’s a work superpower (I’m looking at you, Tracy) but because normalization is a useful strategy for thinking about all kinds of problems. Actually, that’s another article I should write — but not right now.

The “Direct SQL” Problem

Today, I want to share a little tool that fills a technology gap I’ve found at every company I’ve ever been at — an access-controlled, auditable way to directly execute ad hoc SQL Queries for data analysis, troubleshooting, workflow support and data repair.

See, in the “normal” course of business SQL is an infrastructure component — not something folks interact with directly. Enterprise applications provide higher-level end-user and system interfaces that use SQL “under the covers” as a way to store and model data. These applications add important controls on top of the raw data store: making sure that access restrictions are enforced, applying business rules to keep things consistent, generating logs to support auditing requirements, and so on. All good stuff.

But inevitably, something happens that requires people to venture under those covers. Maybe the application just has a bug. Or more commonly, something needs to happen in the real world that the application wasn’t designed to handle — updating the shipping address for an order after it was placed, reprocessing a job after a network interruption, that kind of thing. Perhaps somebody needs metrics for a presentation and the data warehouse doesn’t have the right data. There’s always something.

Easy enough for somebody who knows SQL. Open up a database client or your SQL-aware IDE and you’re good to go. But there’s a problem! Every one of the guardrails provided by application logic are gone. No access control, no auditing, no double-checking that your query doesn’t (oops) accidentally delete a bunch of records. Not only that, but direct database connections typically require access to a machine running directly in a production network — another opportunity for things to go wrong.

It’s easy to say that this “shouldn’t” happen, but that’s just naïve, wishful thinking. Best to just acknowledge reality and at least put some controls in place that minimize risk and maximize capability. That’s exactly what Sql Hammer does.

Try it live!

Before we install anything, let’s take a look at SQL Hammer running on my trusty server at https://shutdownapps.duckdns.org:7083/. I’m assuming you know the basics of SQL syntax. If not, this app probably ain’t much use to you anyways. Log in with any GitHub account and you’ll see two entries in the “Connection” dropdown: “EV Counts in Washington State” and “Scratch DB for Demos etc.” First choose the EV connection, which contains information about EV sales in Washington State (sourced from data.wa.gov) in a single table “evs.”

Running Queries

Your account is granted read-only access to this connection, so you can’t run or save arbitrary SQL statements — you can only run queries that others have created and marked as “shared.” This feature makes direct SQL data safely available to less technical folks in your organization. Click on “EV counts by model year,” then the Run button. Magic!

The “Open as URL” button creates a link that can be bookmarked or shared to provide direct (but still authenticated) access to this query. “Save as CSV” downloads the data for further processing in a spreadsheet app like Microsoft Excel.

Queries can also be more dynamic, requiring user-provided input at runtime. Select and run “EV counts by make in year;” you’ll see that the target model year is listed as a parameter in the results area and applied to the query results. The default is 2023, but you can change this and “Refresh” to look at another year. Parameters are super-powerful, especially for shared queries.

Creating Queries

Now choose the “Scratch DB” connection from the dropdown at the top. In this account you’ve been granted full SQL access, which is a little scary for me, because TTP is a real phenomenon. Please behave yourself! Also, the database resets itself overnight so don’t expect your changes to stick around. Anyhoo.

First click the “New” button, then enter and run a query like “select * from stuff.” Give the query a name and click “Save” so you can use it again later. The query will only be visible to you unless you check the “Shared” box, in which case it can be run (but not edited) by anyone with access to the connection.

Add a parameter to your query by inserting a ? in the query itself, and providing a name in the parameters box — for example, “select * from stuff where label like ?” and a parameter “search”. Run the query again and you’ll be prompted to enter a search string — try something like “yo%” to see wildcard matching live and in color. Woot!

The parameters list should be comma-separated, and each entry in that list must have a corresponding ? marker in the query itself. SQL Hammer doesn’t do a lot of verification on this, so it’s easy to create syntax errors if you’re not careful. A parameters entry can have a default value by adding a colon to the name followed by the value. For example, try “insert into stuff values (?,?)” with a parameter list like “label:mylabel,num:0”.

More than just SELECT

You may have noticed that the query above was actually an insert statement, not a select. This is fine! Assuming you’re configured for write access to a connection, you’re free to execute DML or even DDL statements. You can share these queries too, which can be a great way to capture data from users when combined with parameter lists.

Auditing

With most direct SQL solutions, figuring out who ran what query, when, with what parameters, can be nearly (or actually) impossible. Not only is this just a practical business problem, in many cases it can be a legal one — especially for folks working in regulated environments like healthcare or banking. SQL Hammer to the rescue! If a connection has its “log_queries” column set to 1, every query statement and parameters is logged to disk, together with the user email that executed it. Preserve this logfile and you’ll have no problem satisfying your auditors.

Of course, these logs may now include highly sensitive information — so you’ll have to put processes in place to protect them from prying eyes or tampering. A pain, but well-worth the benefits.

Installing SQL Hammer

SQL Hammer is 100% open source; the code is on GitHub and building it yourself is pretty straightforward. I’ll talk more about that later, but first let’s walk through a binary installation. I’ll be using Linux, but it’s all pure Java so no reason you can’t use Windows or a Mac if you prefer (do let me know if you run into any problems).

First make sure you’ve got a JRE v11 or above. You can check this by running “java -version” at the command line. Next, download sqlhammer-v1.zip from GitHub and unzip it to a directory on a machine that has network access to the database(s) you care about.

Execute “run.sh” in this directory, and point a browser on the same machine at https://localhost:3001. You’ll have to approve the self-signed certificate, but from there you should be prompted for a GitHub login and be on your way! Logs will be written to nohup.out in the same directory; that’s a good place to look for errors if something doesn’t work.

Very cool — but to run “for real” you’ll want to be sure your configuration is correct and secure, so read on. Don’t skip this!

SSL Certificates

The default config.json uses a self-signed localhost certificate for HTTPS. You’ll need to update the SSLCertificateX values to point at your own certificate and key files, using the same same format as Apache’s SSLCertificateFile and SSLCertificateKeyFile. If you really, really want to run without HTTPS you can just delete these values, but that’s probably a bad idea. Remember to set the “Port” value as desired as well.

OAuth2 Provider

As you’ve seen, the default is set up to authenticate users with GitHub accounts, using a stub OAuth2 application registered to my account (seanno). For obvious reasons, you’ll want to pick your own provider and application! If you want to use a social login like Google, Facebook, Amazon or GitHub you’ll find instructions in my recent blog post.

If you’d rather use your enterprise login (likely), you’ll have to figure out how to configure an OAuth2 / OpenID Connection application. Set the “Provider” in config.json to “other,” then provide values for ClientID, ClientSecret, AuthURL, TokenURL and possibly Scope. Most providers make this pretty easy; e.g., for Azure AD / Entra there are instructions here and a config.json fragment will look something like this:

"OAuth2": {
  "Provider": "other",
  "ClientId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "ClientSecret": "yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy",
  "AuthURL": "https://login.microsoftonline.com/zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz/oauth2/v2.0/authorize",
  "TokenURL": "https://login.microsoftonline.com/ zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz /oauth2/v2.0/token"
}

If you have trouble with this, first take a closer look at my Social Login post, and feel free to ping me if it’s still not working — happy to help if I can.

An important note on this: When SQL Hammer starts up the first time, the first person to log in is given full write access to the Metadata store. This will have happened when you first logged with GitHub above. If your chosen provider is associated with the same email address you use on GitHub, all good. If not, you’ll want to start with a fresh metadata store — just delete the dss.sql file (by default in /tmp) after configuring your provider, then log in to recreate these tables.

Cookie Encryption

SQL Hammer uses the config.json values under “CookieEncrypt” to protect it’s login cookie. Since the default value is public on GitHub, anybody with access to your server could theoretically whip up a fake cookie to get access to your stuff. Not good.

In your installation directory, run “java -cp dss-server-1.0-SNAPSHOT.jar com.shutdownhook.toolbox.Encrypt keygen”. This will output a randomly-generated key; put into the “CookieEncrypt.Key” field and you’re good to go.

Metadata Store Location

The default setup stores the DSS database in /tmp; edit the Sql.ConnectionString value if you want it somewhere else.

(Maybe) add JDBC drivers

By default, SQL Hammer includes JDBC drivers for SQLite, mySQL and PostgreSQL databases. If you use a database like Microsoft SQL Server, Azure SQL or Oracle, you’ll need to download those driver(s) to the same directory where you installed SQL Hammer. Then load the JAR(s) by adding them to the “cp” argument in run.sh, separated with colons. Here’s an example doing this for Azure SQL (mssql-jdbc-12.4.2.jre11.jar):

nohup java \
  -cp dss-server-1.0-SNAPSHOT.jar:mssql-jdbc-12.4.2.jre11.jar \
  com.shutdownhook.dss.server.App \
  config.json &

Finding the right JDBC driver is pretty easy; just search “XYZ jdbc driver download” on Google and you’ll get there in no time.

99% of JDBC drivers now self-register by including a META-INF/services/java.sql.Driver file in their JAR. If for some crazy reason yours does not, you’ll need to add its fully-qualified class name to config.json under Sql.PreloadDrivers, which should be an array of string values.

Managing Connections and Access

I’m particularly fond of this — admin tasks are performed not through a pre-built UX, but by using SQL itself. The first user to log into an installation is set up as the administrator, with access to the “DSS Metadata Store” connection. A few pre-built queries are added as well, but honestly the SQL is pretty simple; three tables drive it all. Just click the “Schema” button to see how they’re put together.

connections contains one row for each configured database connection. “name” is the primary key, just a short unique label. “description” is what shows up in the dropdown, and setting “log_queries” to 1 will cause query text and parameters to be logged for audit purposes. The star is “connection_string,” which contains everything necessary to connect to and authorize a specific database connection. SQL Hammer connection strings typically contain login information and are very sensitive — be careful to restrict access to the metadata store!

You can add connections using the “connections: add new” query or just with a simple insert statement. The harder part is figuring out a connection string that works; a great place to start is this article at Baeldung. A few examples can also go a long way; so here you go:

SQLite: jdbc:sqlite:/PATH/TO/FILE
mySQL: jdbc:mysql://SERVER:PORT/DATABASE?user=USER&password=PASSWORD (port is usually 3306)
PostgreSQL: jdbc:postgresql://SERVER:PORT/DATABASE?user=USER&password=PASSWORD (port is usually 5432)

access holds (not surprisingly) rules about who can access which connections. The “user” field can either be a specific user (i.e., logged-in email address) or a wildcard pattern matched with the like operator. Some useful practical applications of this are “%” for matching any user, and “%@xyz.com” to match all users with email addresses at xyz.com.

In order to execute or save arbitrary queries, a user must match an access row for the relevant connection in which the “can_create” column has the value 1. Without this flag, users can only run existing queries associated with the connection that are marked “shared” (and can’t see the underlying SQL statements).

The built-in queries “connections: grant access” and “connections: remove access” can be used to manipulate the table; they’re just parameterized versions of insert and delete.

You may find it useful to create multiple “connection” rows that reference the same database, associating different queries with each and granting access to different users. For example, you may have some users that are interested in queries related to financial data, while others are focused on user behavior. By segmenting these query sets by connection, you can reduce confusion and better conform to the principle of minimum access. Since folks only see the connections they have access to, navigation is no problem.

queries holds the actual statements for saved queries, including the “is_shared” flag that marks whether users other than the creator can execute them. Mostly this table is managed through the user interface, but there’s no reason you can’t manipulate it directly as well.

Building from source

The SQL Hammer frontend is an SPA built with React; the backend is a set of Java handlers that run in the context of my custom classes on top of the built-in HttpServer. All of this is packaged into a single uber-JAR that runs as a standalone process starting with com.shutdownhook.dss.server.App.

Building the package requires a git client, Java JDK v11+, maven, node and npm, most of which you probably having hanging around anyways:

git clone https://github.com/seanno/shutdownhook.git
cd shutdownhook/dss
./fullbuild.sh

The fullbuild.sh script first builds the React bundle in the client directory (I had to increase node memory with set NODE_OPTIONS=--max-old-space-size=4096) and copies the resulting files into a zip resource under server/src/main/resources. It then builds and locally installs the utility and server classes under ../toolbox, then the server components, and then finally assembles the uber-JAR which lands in server/target/dss-server-1.0-SNAPSHOT.jar. Use this JAR in place of the one in the release directory to run your own build.

If you want to actively play with the user interface, it’s possible to run the React SPA separately from the server. This is particularly nice because it gives you hot-reload. The details are a bit grotty to include here; drop me a note and I’ll get you started. If there’s any kind of broad interest I’ll write it up; just too lazy to do it now!

Quirks and Futures

This is most definitely a “v1” release. It works and works pretty well, at least for my purposes. But the UX is definitely awkward in places — the interplay between “Run” in the editing pane and “Refresh” in the run pane trips me up regularly. I’d like to have more ready-access to the schema information while running queries. The log file isn’t as configurable as it ought to be. “Setup” is a joke. And on and on.

But you have to start somewhere — and I’m hoping that the app will find its way to a few like-minded folks that have shared my pain and acknowledge the need for something like SQL Hammer. Direct SQL is not just inevitable, it’s super-powerful. It can even be an asset to agility and compliance when managed responsibly. I’d love to hear what you think!

This post first appeared on Shutdown Hook, please read the originial post: here

People also like

The Ultimate Guide to Cloud Gaming: Discover the Best Services

SQL Hammer (everything is a nail)

Related Articles

The “Direct SQL” Problem

Try it live!

Running Queries

Creating Queries

More than just SELECT

Auditing

Installing SQL Hammer

SSL Certificates

OAuth2 Provider

Cookie Encryption

Metadata Store Location

(Maybe) add JDBC drivers

Managing Connections and Access

Building from source

Quirks and Futures

SQL Hammer (everything is a nail)

Related Articles

The “Direct SQL” Problem

Try it live!

Running Queries

Creating Queries

More than just SELECT

Auditing

Installing SQL Hammer

SSL Certificates

OAuth2 Provider

Cookie Encryption

Metadata Store Location

(Maybe) add JDBC drivers

Managing Connections and Access

Building from source

Quirks and Futures

Share the post

Subscribe to Shutdown Hook

Thank you for your subscription