· Data Engineering · 3 min read
Best practice database column name/naming conventions
Introduction
Naming things. The hardest challenge in programming! I’m only slightly joking. Coming from a website development background, writing css class names on complex websites is probably the most difficult part of css. Luckily we don’t share quite the same problem with databases however naming is important.
The problems that I run into are mainly because the points below are not followed. I’m sharing these as general best practices however I believe that many should be part of the database design framework that should be reviewed and consistent across all databases/schemas in a business.
Choose a naming convention and stick to it everywhere.
It is generally a good idea to use snake_case
. This means all lowercase letters and underscores to separate words in column names. This makes the names easier to read and reduces the risk of naming conflicts. It really doesn’t matter if you choose PascalCase
or camelCase
. It’s opinionated and the one you choose doesn’t matter. The issues arrive when the naming convention differs between tables in the organisation.
Having this consistent may seem like a small thing, but between the two above points, if they’re correct and I know what I’m looking for I should intuitively know the column names without having to look them up.
Use descriptive, concise names
Column names should clearly and concisely describe the data that they contain.
As short as possible, while still being descriptive
Never use spaces. Ever. Please
It is generally a good idea to avoid using spaces in column names. Spaces can make it more difficult to work with the column names in SQL statements, and they may cause issues with some database systems. Don’t do it
Never use Special Characters
The only one allowed is underscores. No exceptions. There is never a reason.
Effective data analysis relies on clean, consistent, and well-structured data. In this post, we explore the different stages of data cleanliness, from raw data to production-ready data
Pick a name, stick to it everywhere
There is nothing worse than having the same thing with a different name in different tables. Pick a name, ensure it is used everywhere. In an organisation with large teams, multiple outlets, and lots of data sources, you’re identifiers are going to be called lots of different things in source systems. Ensure you have one name for this and it is the same everywhere, regardless of source system.
Do not use reserved words
There are a number of words that are reserved by various database systems and cannot be used as column names. Examples include “table,” “view,” and “index.” Be sure to check the documentation for your database system to see which words are reserved.
Use prefixes to group related columns
If you have a large number of columns in a table, it can be helpful to use prefixes to group related columns together. For example, you might use “customer*” as a prefix for all columns that contain customer data, and “order*” as a prefix for all columns that contain order data.
These are all the ones that tend to bug me day to day however I’m sure I will add to this list in the future :)
Here’s a handy list of all the best practices for database columns
- Choose a naming convention and stick to it everywhere. Default to
camel_case
- Use descriptive, concise names
- Never use spaces in column names. Ever. Please
- Never use Special Characters
- Same column in multiple tables? Pick a name, stick to it across all tables
- Do not use reserved words
- Use prefixes to group related columns in large tables. (Really helps autocomplete)