How To Design a Multi-Language Database

How To Design a Multi-Language Database

·

6 min read

Today we look at three best-practice database designs to store data in multiple languages and easily scale to new markets.

Reaching millions of users with an application is every developers dream. Achieving this goal becomes easier if users from all over the world can use your application. Since not all users know English or your target language, you need to design your application to be multilingual. This would not be possible without a multi-language database.

Designing a multi-language database that can easily scale to new languages is not easy. This is why you should rely on best practices. Here, you will see why you need a multi-language database, three great multi-language database designs, and how to choose the best one for you.

Let’s now learn everything you need to know about multi-language database design.

Why Do You Need a Multi-Language Database?

You never know how much your project will grow, and your small app for your friends could become an international service used by millions of people. If you want your product to be able to scale internationally, you need to design it so that you can easily adapt it to different cultures and markets. That is what internationalization is about.

Building a database ready for internationalization means designing a database that can store multilingual data. In other words, the backend should be able to provide data in multiple languages. To do this, the backend should connect and retrieve this data from a multi-language database.

Notice that giving users the possibility to switch between several languages is a nice feature to have. This is especially useful for polyglots or non-native speakers. So, even if your project is small and targets a local market, you should consider a multi-language database. After all, you cannot know in advance how successful your project will be and which users will use it. Let’s now learn why choosing the right multi-language database design for your needs is crucial.

Why You Should Design Your Multi-Language Database Carefully

Changing a database is a very costly operation in terms of time and energy because it triggers a ripple effect. Modifying the structure of the database involves changing the backends connected to it. This may also require you to adapt the frontends that rely on those backends accordingly. As you can see, changing a database structure is not an operation without consequences. For this reason, you should design a database that can scale easily and you do not have to change frequently.

As you can imagine, there are several ways to design a multi-language database, and each solution has its pros and cons. Considering the importance of the database structure for an application, you have to design your multi-language database wisely. This is why you should not start from scratch, but rely on best practices.

Let’s now take a look at some multi-language designs based on best practices.

3 Multi-Language Database Designs

Let’s delve deeper into the pros and cons of three multi-language designs I personally used in my experience as a full-stack web developer working for startups all over the world.

1. Column Approach

In this approach, each field in a multilingual table has a number of columns equal to the number of languages supported by the database.

Specifically, this is what the column name template looks like: columnName_languageCode

Pros

  • Simple: it is easy to implement.
  • Fast: it does not involve JOIN or slow queries.
  • Easy to deal with non-translated fields: if the translation for a field is missing, you can just use COALESCE. E.g. COALESCE(name_it, name_en) → returns name_it if it is not NULL, otherwise the name_en default value.

Cons

  • Hard to maintain: adding a new language requires updating all multi-language tables in the database. This also means that you need to change your ORM mappings accordingly.
  • Not scalable: the number of columns in tables grows with the number of languages supported by the application.
  • Complex SELECT conditions: forget about SELECT *. You need to specify each column in your SELECT clause.

2. Row Approach

In this approach, there is one row for each language. The key to identify a multi-language entity is the following a composite primary key:

<id, languageCode>

Pros

  • Simple: it is easy to implement.
  • Fast: retrieving the translated content only requires a WHERE condition on languageCode.

Cons

  • Complex keys: using a composite primary key makes identifying an element and JOIN queries more complex.
  • Duplicated content: to simplify things, non-translated columns generally store the same content saved in the columns of the default language row. This means that you will have a lot of duplicate content in multi-language tables.

3. Translation Table Approach

In this approach, each column of a multi-language table is an external key to a translation table. In other words, there is a translation table for each table that involve multi-language fields.

Pros

  • Scalable: adding a new language does not involve changes to the database structure.
  • Supports historical data: you can use the translation table to keep track of the translation history for each field in an entity.
  • Centralized: all translations for an entity are stored in one place.

Cons

  • Complex queries: queries become more complex because the information associated with a multi-language entity is spread over two tables.
  • Slow queries: retrieving all the info associated with an entity requires as many JOINs as the number of columns. Considering the burden that JOINs have on performance, this can easily become a problem.
  • Duplicated tables: it increases the size of your database in terms of tables. Plus, you need two tables to define each multi-language entity.

What Is the Best Multi-Language Database Design for You?

As you can see, every multi-language database design has the same amount of pros and cons. This means that there is no preferred approach in absolute terms. You have to choose carefully based on your requirements, needs, and goals. I have personally used all three approaches in different projects and learned the lessons I've shared.

The column approach is particularly useful when dealing with Big Data especially from the perspective of having a database with tables with several millions of rows when you simply cannot afford JOIN queries or duplicated content. Although it is not the most scalable solution, the column approach is the only viable design with large data.

On the other hand, the row-by-row approach is useful when the branches of your company branches operating in the local market have freedom of action. In this case, duplication of content can even be an advantage.

Finally, the tabular approach is the right design if you are looking for an elegant and scalable solution and do not mind the performance drawback. The table approach is best for projects where you know that the data will not grow too large.

Conclusion

Designing a database that can store multilingual data is essential to help your business scale internationally. Also, it enables non-native speakers to use your application. Changing a database is a complex and time-consuming operation with non-negligible consequences on the application architecture. This is why you should design your database as multi-language right from the start. Here, we have looked at three different approaches to designing a multilingual database, studying their pros and cons, and going into detail about which one is best for you.