A Complete Guide to Generated Columns in MySQL

A Complete Guide to Generated Columns in MySQL

·

7 min read

MySQL generated columns pose as a powerful, easy-to-use, and advanced tool for anyone who wants to add automatically generated data to their tables - in this blog, we will learn everything you need to know to master them.

Generated columns allow you to store automatically generated data in a table without using the INSERT and UPDATE clauses. This useful feature has been part of MySQL since version 5.7, and it represents an alternative approach to triggers when it comes to generating data. Also, generated columns can help you make your query easier and more efficient.

Let’s now learn everything you need to know to master generated columns in MySQL.

What is a MySQL Generated Column?

A generated column is similar to a normal column, but you cannot change its value manually. This is because an expression defines how to generate the value of a generated column based on the other values that are read from the other columns of the same row. So, a generated column works within the domain of a table, and its definition cannot involve JOIN statements. In other words, you can think of a generated column as a sort of view but limited to columns. Notice that generated columns are different from SQL triggers, and you can define them only when using CREATE TABLE orALTER TABLE statements with the syntax below:

generate_column_name column_type [GENERATED ALWAYS] AS (generation_expression)
[VIRTUAL | STORED] [NOT NULL | NULL]
[UNIQUE [KEY]] [[PRIMARY] KEY][COMMENT *string*]

The AS (generated_column_expression) clause specifies that the column you are adding or updating to a table is a generated column. The generation_expression defines the expression that MySQL will use to compute the column values, and it cannot reference another generated column or anything other than the columns of the current table. Also, notice that the generation expression can only involve immutable functions. For example, you cannot use the CURDATE() function that returns the current date in a generated column expression definition because it is a mutable function.

You can also precede AS with the GENERATED ALWAYS keywords to make the generated nature of the column more explicit, but this is optional. Then, you can indicate whether the type of the generated column is VIRTUAL or STORED. You will learn the difference between the two types in the chapter below. By default, if not explicitly specified in the query, MySQL marks a generated column as VIRTUAL.

Let’s now see the generated column syntax in action in a CREATE TABLE query:

CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(100) NOT NULL,
first_name VARCHAR(60) NOT NULL,
last_name VARCHAR(60) NOT NULL,
full_name varchar(120) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name))
);

In this example, the full_name column will automatically store the concatenation of the first_name and last_name columns.

What Types of Generated Columns Exist in MySQL?

As explained earlier, you can define a generated column as VIRTUAL or STORED. Let’s now take a closer look at these two types.

Virtual generated columns

MySQL does not store a generated column marked as VIRTUAL. This means that MySQL evaluates its value on the fly when required. This typically happens immediately after any BEFORE query is triggered. In other terms, a virtual generated column takes up no storage space.

Stored generated columns

MySQL stores any generated column marked as STORED. This means that MySQL takes care of evaluating its value and storing it on the disk every time you insert or update a row. In other terms, a stored column requires storage space as if it was a normal column.

Virtual generated columns vs Stored generated columns

Let's now learn more about the pros and cons of virtual and stored generated columns.

Virtual generated columns

Pros

  • Their creation is instantaneous because MySQL only has to change the table metadata.
  • They require no disk space.
  • INSERT and UPDATE queries come with no overhead because MySQL does not need to generate them.

Cons

  • MySQL has to evaluate them when reading a table, making SELECT queries involving them slower.

Stored Generated Columns

Pros

  • MySQL can read them as if they were normal columns, which assures fast retrieval with no overhead.

Cons

  • When added to a new table, MySQL has to rebuild the entire table.
  • INSERT or UPDATE comes with an overhead because MySQL has to generate them.
  • They require disk space. Also, notice that you can mix VIRTUAL and STORED columns within a table, and they both support MySQL indexes and secondary indexes. But as explained in the official documentation, secondary indexes on virtual columns take up less space and memory compared to stored generated columns. So, virtual generated columns are more efficient when it comes to secondary indexes.

Why Adopt Generated Columns?

There are several reasons to adopt generated columns, but the three below are the most important ones.

  • They provide you with cache capabilities to make your queries faster: generated columns give you the possibility to define columns containing useful data so you can then efficiently retrieve it whenever there is a need.
  • They allow you to simplify query expressions: instead of making your queries complex, you can spread the complexity over the generated columns and then use them in simple filter operations.
  • They empower you to define functional indexes: MySQL implements functional indexes as hidden virtual generated columns. In other terms, generated columns give you the possibility to define efficient and advanced indexes involving MySQL functions.

MySQL Generated Columns in Action in Real-world Examples

Let’s now see generated columns in action in some examples coming from my experience as a backend developer collaborating with data-driven startups.

Using a generated column to concatenate columns for consistency reasons

When developing the frontend of your application, you may notice certain patterns in data representation. For example, in sports, players on a team are generally identified with the following format:

first_name [first_carachter_middle_name.] last_name [(jersey_number)]

As you can see, you can easily generate this data field by aggregating the four columns with the following generated column:

string_identifier VARCHAR(255) GENERATED ALWAYS AS
(CONCAT(first_name, ' ', IF(middle_name IS NULL, '', CONCAT(LEFT(middle_name, 1), '. ')), last_name, IF(jersey_number IS NULL, '', CONCAT(' (', jersey_number, ')'))))
STORED

This would produce:

Cristiano Ronaldo (7)
Lionel A. Messi (10)

In this case, a generated column empowers you to standardize the data field format directly at the database level. In addition, a stored generated column avoids the inevitable overhead of constructing this field every time you need it.

Employing a generated column to automatically generate hashed IDs

Typically, you use the IDs of your resources in the URL of your website or REST APIs to retrieve the data you need. But publicly exposing your IDs could pose a security problem. That is especially true when you find yourself using autoincremental IDs, which are easy to predict and make scraping or bot attacks easier.

To avoid this, you can think about hiding your original IDs through the use of automatically generated, random, more secure public IDs. You can achieve this with a virtual generated column by hashing your IDs as follows:

public_id VARCHAR(40) GENERATED ALWAYS AS  SHA1(CONCAT("PLAYER", id)) VIRTUAL

Notice that to avoid generating known hash values, you can concatenate your ID with a special keyword. Learn more about MySQL encryption and compression functions here.

Defining a Generated Column to Simplify Data Filtering

When filtering data, there are columns that are more useful than others. Also, you often have to change the representation of the values stored in the columns to make filtering simpler or more intuitive. Instead of doing this in every filter query, you can define a helpful generated column storing the information you need to perform the filtering in the desired format.

For example, you could define a generated column to more easily find the players on a basketball team as follows:

filter_string VARCHAR(255) GENERATED ALWAYS AS
(CONCAT(first_name, ' ', last_name, IF(birthdate IS NULL, '', CONCAT(' ', DATE_FORMAT(birthdate, "%m-%d-%Y"))))
STORED

Such a column would produce:

LeBron James 12-30-1984
Stephen Curry 03-14-1988

This contains data useful for filtering and with the player's birthdate in the US format.

Generated columns vs. Triggers

As explained previously, you can only use generated columns within a table. Also, they can only involve immutable functions, and MySQL generates their values in response to an INSERT or UPDATE query. On the other hand, a trigger is a stored program that MySQL automatically executes whenever an INSERT, UPDATE or DELETE event associated with a particular table occurs. In other terms, triggers can involve several tables and all MySQL functions. This makes them a more complete solution compared to generated columns. At the same time, MySQL triggers are inherently more complex to use and define and also slower than generated columns.

Final Thoughts

MySQL generated columns are undoubtedly a great tool that can serve many uses and bring several benefits. Adopting them is easy and effective, although not many database developers use them. Either way, knowing how to take advantage of them may become essential to making your backend application faster, cleaner, and less complex. I hope that by explaining the use cases of generated columns through real-world examples I was able to help you gain more knowledge in the database world, and until next time.