Scripting SQLite using Lua
by Ashwin Hirschi, Reflexis, April 2012.
- The binding basics
- Using the utilities
This note describes how to work with SQLite databases using Lua. While the Reflexis Flow platform provides several convenient ways to deal with databases from within application flows, you may occassionally want to work with SQLite databases directly.
Perhaps you need to create a specialised starter for your project. You may want to create your own worker script for a dedicated background task. Or you're simply busy wrapping a web service using Flow's lib/ws. If you need to work with databases, this note will tell you how to do it.
From here on we assume you have a working knowledge of Lua. If you don't, we heartily recommend Roberto Ierusalimschy's excellent Programming in Lua (3rd Ed.). Roberto is the main architect of Lua and his well-written book explains everything you need to know. Get it, you won't regret it.
As far as SQLite is concerned, you'll need to familiarise yourself with the SQL syntax it supports. The SQLite site contains lots of documentation plus syntax diagrams you may find convenient. However, as this note focusses on the scripting side of things, even a basic knowledge will get you by for now.
The binding basics
In this section we'll discuss binding basics. In Lua terms, a binding is a bridge or connection between parts. So when we say binding, we mean the glue (code) we created to connect SQLite to Lua. It's through this binding that you can access SQLite functionality by writing Lua scripts.
To work with SQLite you'll have to deal with databases and statements. Typically, you open a database, prepare one of more statements and then run (or in SQLite parlance: step) them. Afterwards, you'll have to finalize the statements and close the database, so any linked resources are freed.
Here's a snippet of Lua code to give you an idea:
The first line calls the sqlite function to open a database in write-and-create mode. If a database with this file name already exists, it will be opened. If it did not yet exist, an empty database will be created. Other mode options are "r" (for read-only) and "w" (for writing without creating). The sqlite function returns a database handle on success. Otherwise it returns nil, an error code and an error message.
Once you're holding a database handle you can perform several actions on it by calling its methods. The example above compiles a SQL statement to create a table by passing it to the prepare method of the database (on line 2 thru 8). Note that preparing a statement does not yet run it!
On success db:prepare will return a statement handle. On failure (e.g. if the SQL statement contains errors) it returns nil plus an error code and an error string indicating what's wrong. If anything goes awry, you can also call db:error to obtain the same error code and message.
Like database handles, statement handles provide several methods for you to work with them. Since we only need to run the prepared statement, we simply call the step method and then finalize to tell Lua we have no more need for this statement. To finish we call db:close so SQLite can clean up after us.
Similar to working with files through Lua's io library, if you do not finalize a statement or close a database, Lua's garbage collector will - after it has detected you no longer reference the handles - do this for you. However, we consider it good practice to call sm:finalize and db:close explicitly.
Now, let's consider a scenario where we need to insert, say, many thousands of records into a table. Creating this many separate statements, preparing, stepping and finalizing them would be inefficient. Therefore, SQLite offers us the option of preparing a single statement with placeholders (also called parameters). Before running the statement you can put values into these placeholders, using bind.
Here's an example to illustrate this approach (using the database we created earlier):
In the statement passed to the prepare function we can see that the placeholders start with semi-colons and are called :name, :email and :phone. After the prepare we create a simple helper function, named insert. Once this helper is defined, it is called 3 times, each time with a new table record.
The bind call inside the helper will clear the placeholders of our statement (note the lexical scoping of sm!) and traverse the key value pairs of the table to fill them. After the statement has been stepped, reset is called to reset the statement to its initial state. Now the statement's ready to be executed anew. Incidentally, reset does not clear any placeholder values. Use bind (or clear) for that.
Through our little helper, that binds the values from the table to the statement, before running and resetting it for re-use, the code remains quite readable. And since we're using placeholders we avoid the overhead of assembling, creating and destroying lots of statements.
Lastly, SQLite assigns each record in a database table a unique integer row ID. Since the person table in our example defines id as the primary key column, it will match this unique rowid. You can call the db:lastid function to fetch the row id of the most recent successful insert.
Before we move on to the querying examples, some words about transactions. The code above only inserts 3 records. Let's surround the 3 insert calls with a numerical loop that runs these inserts 100 times. Running this new script takes... a surprisingly long time! So, what happened?
If we run our code as is, SQLite treats each SQL insert as a separate transaction. However, as SQLite is an ACID database, it tries to guarantee that each transaction was processed reliably. This effectively puts an upper limit on the number of inserts that can be performed per second.
Fortunately, the solution to this problem is easy: we only need to create an explicit transaction and make the inserts part of it! There are several ways to instruct SQLite to start a transaction. Here, we'll simply use BEGIN to signal the transaction start, and END to indicate the final commit.
As we've seen above, the generic way to run simple statements is to prepare, step and finalize them. But there's a short form as well: db:exec. The db:exec method is limited. For instance, it won't accept placeholders. But it's okay for simple SQL statements like BEGIN and END. You can use it like this:
Running the script once more proves our problem is indeed solved! Later, when we'll discuss several utility functions, you'll find that sql/util contains 2 somewhat more robust transaction helpers:
To conclude the SQLite binding basics, we'll look at querying databases. In the previous examples we first created a database table and then inserted records into it. So, we've been putting stuff in, but nothing ever came out. Let's turn this one-way street into a two-way one!
Many of the methods we saw before, still come in handy: we can open a database, prepare a select statement, bind values to placeholders, and then step our statement. But, where do we go from there?
Well, if all went as planned, one of two things happens: either step signals that a row was found and can be read, or no new row could be found and you've reached the end of your query's result set.
Before we continue, let's look at an example:
The new statement method on the block is called row. In this example calling it returns an array holding the current row of the result set.
Comparing the result of sm:step() against the number 100 may look somewhat out of place... It is the so-called SQLITE_ROW constant that SQLite returns to signal a row is available. You don't need to worry about remembering it: the step + row combination is used so frequently, the binding provides a next method that combines these 2 functions (and hides the SQLITE_ROW check as well).
So, the above loop can be rewritten as:
Essentially, you can use the next method to step through the result set of your query, row by row. Furthermore, you can provide next with a data format parameter to indicate how you'd like to receive the row data. The format options are: sequence (i.e. the default, used above), record and values.
The loop in the example would look like this when using record format:
And here's the version using values as the data format:
The best data format to use with sm:next really depends on what you're doing with the query results in your own code. But if you're scanning through large amounts of data (like tens of thousands of records), the values format helps you avoid creating (and, indirectly, collecting) many Lua tables.
Reading records - The sequel
The values format example helps us introduce 2 new methods, namely db:records and sm:records. These methods can be used together with Lua 5's generic for construct and provides an efficient and easy means for querying your database.
Here's the db:records equivalent of our previous query example (in full!):
As you can see, using db:records results in an elegant querying mechanism, with little to no code for "housekeeping". The db:records method accepts 3 arguments: the query, a table with values to bind, and data format. As the example shows, the default data format is values.
The sm:records method works similar to db:records. But since you must prepare and finalize the statement yourself, you have the option of re-using it in several invocations of sm:records. Also, for obvious reasons sm:records only accepts 2 arguments: the table with values to bind and a data format.
Not nil but false!
If you've been trying out the examples we've presented so far, something has probably caught your eye by now: in result sets the NULL values are not represented by nils. Instead, a SQL NULL is represented using Lua's false value.
There are several reasons why false is a more natural fit for NULL than nil. First and perhaps foremost, while nils create holes in Lua 5's array tables (also called sequences), false values do not.
The separate "n" field that Lua 4 used to track the length of tables holding arrays, was dropped for Lua 5. This means that Lua 5 cannot reliably determine the size of a result row with a column containing a NULL, because the nil leads to undefined behaviour of the length operator.
A second reason why nils-for-NULLs is a bad idea, is Lua 5's generic for. This iterator mechanism is both elegant and powerful, as it enables you to hide complex mechanisms behind a simple front.
However, when using the generic for, a nil value assigned to the first loop variable signals the end of the iteration. Hence, if the first column of your result set can contain NULLs, iterating through this set may very well end prematurely.
This is why we eventually settled on using false values for NULLs. Not only does it fix the issues we just discussed, it also works out quite naturally in actual code. To illustrate, this (fairly silly) code prints out persons who do have a phone number but no email address:
Using the utilities
In the first part of this note we talked about the primitives you can use to talk to SQLite databases. You've seen that the records methods make it very easy to run your queries. We've also shown that statements can be executed using a combination of prepare, bind, step/next, reset and finalize methods.
In this second part we'll discuss several utility functions. As the term utility implies, we've designed them to be useful. If, however, other functions would suit you better, it should be trivial to write them yourself. Remember that the utilities always use the basic methods we discussed earlier.
To get access to the utility functions, simply require "sql/util". This will create a sql namespace table, holding a variety of functions to help you run your statements efficiently and without the need to write many lines of code.
The sql.run function should be easy to remember. It takes as arguments a database handle, a statement string and (optionally) a table with placeholder values, and runs the statement on the database. If all goes well, it returns true. Otherwise it returns nil and an error code.
By now, you should have an idea of how to implement sql.run yourself. It might, for instance, look something like this:
The actual code is slightly different (and later on we'll explain why). But this comes close enough.
Note that while db:records is very convenient to process a query result set, it is a bit overkill to have to write a for-loop if all you want is fetch a single value (or record) from your database. And since sql.run is not suited to run queries, we provide a companion function:
The sql.read function accepts a database handle, a query string, an optional table with placeholder values and an optional data format specifier. You might call it like this:
From this example you can deduce that the default value for the data format parameter is values.
While sql.read fetches a single record or value for you, sometimes you need to gather several values or records using a single query. Of course, this can easily be done using the db:records or sm:records functions. But we provide a small utility function, so you don't have to write the for-loop:
The sql.collect function takes the exact same arguments as sql.read does: a database handle, a query string, an optional value table and an optional data format. For example, to get all the phone numbers from the people in our database, we can simply do something like this:
The sql.run and sql.read helpers above are handy for quickly running a statement or reading a value. But, if you need to execute the same statement several times, it's not very efficient to keep preparing and finalizing that statement for each and every invocation. If you need to run the same statement multiple times, you'll find the sql.runner and sql.reader functions very useful.
Both sql.runner and sql.reader leverage the fact that Lua treats functions as first class values. That is, your scripts can pass functions along like it does strings or numbers. In other words, you can pass functions as arguments to other functions or functions may return functions as their result.
To give you an idea of how this works, we'll look at code that resembles the sql.runner function:
As you can see, sql.runner prepares the statement but doesn't run it. Instead, it returns a function it constructed on-the-fly. When called, this function runs the prepared statement with the data that you provide. The call to sm:reset ensures you can call the same function again with new parameters.
Let's try it out, while dispensing with the database and require logic for brevity's sake:
If you recall the binding basics examples, the code above somewhat resembles the example we used to introduce statement placeholders. However, in that example we constructed a helper (insert) function ourselves, while here the sql.runner function does all the work for us!
While sql.runner can help you run a single statement many times, the sql.reader function enables you to run the same query again and again (hopefully with different parameters ;-)). So, if we need an easy way to repeatedly fetch someone's email address, we might do it like this:
As should be clear by now, the multi-shot helpers effectively turn SQL statements into Lua functions. Though extremely convenient and efficient, there's really nothing magical about this. Handy tricks like these all come courtesy of the powerful features that Lua provides.
In the previous two sections we've explained a host of utility functions and shown you in what type of situation they can help you code effectively. We'll finish by mentioning some other useful helpers and features from sql/util.
When we introduced sql.run earlier, we said that the implementation shown differed from the actual implementation. The main difference is error reporting. When the sql/util functions fail to prepare a statement, they try to report what went wrong. This can help you detect errors early on.
By assigning a function to sql.print, you can install the function that will receive the error message. So, simply assigning print to sql.print will ensure errors in statements will automatically be printed. Alternatively, you can also install a logging function that stores any SQL errors your script encounters.
Of course, if you write your own utilities you'll have to re-implement this error reporting. To enable easy re-use of the reporting facility, sql/util exposes sql.prepare. Simply replace your own db:prepare calls with sql.prepare(db, stmt), and you automatically hook into the reporting system.
Since SQL injection attacks are running rampant these days, you should avoid assembling statements from scratch as much as possible. In our experience, using statement placeholders pretty much alleviates the issue. But if you really need to glue values into a statement, quote them using sql.quote.
Finally, we'd like to re-introduce two functions we've already mentioned in the basics section. The sql.start and sql.stop helpers signal the beginning and ending of a transaction. Both take a database as their first argument. But sql.stop takes an optional flag, that - if true - will roll back the transaction.
Well, we've reached the end of this "Lua plus SQLite" note. The purpose of this piece was to provide you with a clear picture of how SQLite databases can be scripted in Lua, using the binding we've created for our platform. We sincerely hope we've succeeded in that aim.
We believe the database and statement primitives themselves are easy to grasp. Better yet, people familiar with the original SQLite C API, should have recognised many method names. In our binding design we've tried to stick with the original nomenclature as much as possible.
The sql/util functions provide a convenient layer on top of our SQLite binding. Things have literally evolved over a period of nearly 7 years to become what they are today. And during that time these functions have proven their usefulness in numerous projects and products.
Alas, this note is not long enough to mention all the available features. For instance, our binding provides an integrated profiler that helps you track and analyse your database interaction, so you can tune your statements and optimise performance. This, however, will have to wait for another note.