CODE SQL Server Database Standards
Mike Yeager January 2, 2020
In an effort to promote consistency, maintainability and good design, CODE has developed the following standards for SQL Server databases. Each project is different and in the end, the client always has final say, but we try to adhere to these standards where possible. We’ve found that having standards promotes faster, more efficient development and makes a significant difference to our client’s happiness and both of our bottom lines. Every technical decision has pros and cons. If you don’t agree with what’s listed here, we need to discuss it and potentially change the standards. If you feel we should do something different in a particular instance, we need to discuss it and make a call for that instance. The worst possible case is when one team member deviates from the standards and the rest of the team has to adjust to and maintain something unfamiliar and potentially detrimental to the job. In short, discussion good, lone ranger bad.
CODE Philosophy Towards Databases
Every situation is different and there are exceptions however, we can address the most common scenario for us, the line of business application. The database’s job primary is to store and retrieve data and to maintain relational integrity. SQL Server is a terrible programming environment. In most situations, business logic should exist in software.
Table names should be singular, not use underscores, dashes or other special characters and should be Pascal cased. For example, Customer or CustomerHistory.
Column names should not use underscores, dashes or other special characters and should be Pascal cased.
Primary key, foreign key, index, constraint and other names internal to SQL Server are not strictly important, but should follow SQL Server’s standards and should be kept up to date. It shows attention to detail and pride in your work.
The primary key column for each table should be named Id. (Primary keys are covered later in this document). Foreign columns should be named TableNameId. For example, CustomerId.
Every table should have a primary key. Primary keys should be a surrogate key (no meaning other than to identify a row in a table), and should be named Id. A single column Uniqueidentifier (Guid) should be used for primary keys. These columns should NOT be marked as the Clustered index for any table expected to contain more than 1,000 rows!
Making the primary key a clustered index is the default in SQL Server and you MUST manually turn this off when using Uniquidentifiers as PKs or performance will grind to a halt when the tables get large. NewSequentialId() should not be used. CODE does this for several reasons. Records can easily be created offline. Primary keys are unique across tables. Data sets can be combined, moved, imported, exported, etc. without losing referential integrity. SQL Replication and other data merging schemes become much, much easier.
Do not choose clustered indexes for tables until performance tuning needs to be done. This is usually long after development is complete.
Whole books have been devoted to choosing an appropriate clustered index for a table. In general, tables of more than a few thousand rows can benefit greatly from having an appropriate clustered index. Making the wrong choice can be worse than not having a clustered index at all. DateTime columns make excellent clustered indexes in most cases. The determining factor when choosing the clustered index for a table is the ability to locate a single record or range of records quickly, so the choice is determined more by how the data will be queried most often and where the biggest performance bottlenecks end up being. While this is sometimes known early in a project, it’s often better to wait until usage patterns are known to make this choice.
Indexes and Performance Tuning
Do not add indexes, other than the primary key index to the database until performance tuning needs to be done. This is usually long after development is complete.
As with choosing a clustered index, entire books have been dedicated to this subject. Single column vs multi-column, included columns, covered indexes, statistics and other factors are important in choosing indexes. Also, performance is highly dependent on the data in the database. Two companies running the same app with the same database structure could require vastly different indexes to get good performance. That said, there are often indexes that provide good performance in most instances and those can sometime be applied generically to all databases of that type, but that can’t be determined accurately until there is enough data and usage patterns to test with.
Foreign key columns should be named using Pascal case as TableNameId. For example, a CustomerId column found in an Invoice table would be a foreign key pointing to a record in the Customer table. Foreign keys should always be defined in the database structure and never enforced only in code.
If the relationship is optional (for example, you can have invoices that are not linked to a customer- perhaps for walk-in clients), then the column should be null-able.
Stick to the following data types for most situations:
DateTimeOffset(7) (when data must be synchronized across multiple time zones)
While SQL Server offers a lot of data types, these are the most common and are supported across almost every platform. They also map very well to and from .NET. Most business apps can be built using only these types. Disk space is relatively cheap. Saving a couple of bytes per row in a table with less than a million rows often ends up costing far more than it was worth.
A special note about strings. Strings in .NET are Unicode and using a Unicode string to do a search on a non-Unicode column in SQL Server leads to dreadful performance, so avoid using Char and Varchar. They don’t work well with .NET. Use NChar and NVarChar instead. NVarChar trims spaces for you, NChar does not. NChar columns can be indexed. NVarChar columns cannot. NChar columns perform better for shorter strings, but perform worse for longer strings when there are a lot of blanks padding the end of the string. For example, an NChar(50) field for company name where most company names are about 10 characters will not perform as well as an NVarChar(50) in most cases and must constantly be trimmed in code (though the NChar(50) could be indexed). Also, there is little performance penalty for longer NVarChar columns vs. shorter ones. For example, NVarChar(200) is not inherently slower than NVarChar(50). You should specify a size if possible, up to 4,000. Beyond 4,000, use NVarChar(MAX). MAX does incur a performance penalty over specifying a value, but sometimes it's necessary.
Null and Default Values
Only allow nulls in columns when necessary. Usually, nulls are only allowed in the following situations:
- A Foreign Key column where the relationship is optional.
For example, you can have invoices that are not linked to a customer- perhaps for walk-in clients
- A Date, DateTime or DateTimeOffset column where a value may be missing.
Using a specific magic value, such as SQL Server’s minimum value is messy and error prone. If the data type is later changed to a different date type in SQL Server, that minimum value could change. Also, the minimum value will not map to the minimum value in .NET.
- The odd situation where NULL would not be treated exactly the same as an empty value.
In most business applications and empty value is treated exactly the same as a null value. Technically, having a NickName field that’s NULL means the nickname isn’t known: perhaps the data was never collected. Having an empty NickName means that we know for sure that the user does not have a nickname. In most business apps, nobody cares about the difference. The line is blank on the screen and prints out blank on reports. Avoiding NULL handling in SQL Server is a productivity benefit in most cases. Not having to say “WHERE NickName IS NULL OR LEN(NickName) = 0” or write code such as, ISNULL(FirstName, ‘’) + CASE WHEN LEN(ISNULL(NickName, ‘’)) = 0 THEN ‘ (’ + NickName + ‘) ‘ ELSE ‘ ‘ END + ISNULL(LastName, ‘’) makes it easier to code and fewer mistakes are found later.
The default value for Id columns should be newid(). The default value for null-able columns should not be specified. The default value for foreign key columns should not be specified. For every other column, a default appropriate to the data type should be specified. For example, 0 for in or ‘’ for NVarChar(). This doesn't matter when working with Entity Framework for example, but makes a great deal of difference when entering a new row manually in SSMS or Visual Studio.
If values for a pre-defined list of selections need to be printed or returned as text to a non-.NET client (because they can’t make use of the .NET enums), a lookup table (or tables) should be maintained in the database.
Integers are often used to store selections from a pre-defined list of values. For example, Male, Female, Not Disclosed. As .NET developers, we often use enums as a friendlier way of working with those integers in code, but outside of our apps and whomever we share those enums with (as a .dll), there are just integers. For reporting in particular, we’re often not working in a .NET environment. If we have a lookup table in the database, we can join to it and print, “Male”, “Female” or “Not Disclosed” on our reports. Unfortunately, there is no silver bullet for the problem when you need to convert integers to text in different platforms. Every change we make to the list has to be made in every platform, or they get out of sync and issues arise. Our best attempt to solve this has been using enums in code and keeping a lookup table in the database in sync with those enums.
Views should be avoided unless there is an unusual and compelling reason to use them.
While views can solve some interesting problems, they make the database more complex and add a maintenance cost. In addition, they can contribute to performance issues and are often re-used as a convenience in situations where much of the data is ignored, using up more resources than necessary. Only use views when they add a lot of value and try to keep the views as simple as possible. The vast majority of data access in business applications is very straightforward and can be programmed more efficiently without a lot of effort.
Stored Procedures and Functions
Avoid stored procedures and function in SQL Server. Use them when they significantly cut down on the amount of data transferred to and from a client. Use them when functionality needs to be shared among applications that hit the database directly. Stored procedures that gather and pre-process the data for reports are a good example of when writing logic in a stored procedure can make sense.
Business logic generally belongs in code. There are instances where it’s much more appropriate to process large amounts of data on the server instead of transferring it all to the client and processing it there and that’s an excellent use of a stored procedure.
Most data access these days is through services, however there are notable exceptions. Many report writers need direct access to a database. Clients like Excel may need direct access to a database to pull external data. In these situations, it makes more sense to put the logic required to service these clients in the database. A stored procedure in a database that gathers and pre-processes all of the data needed for a report can serve an app hitting it through a service, a report writer and our Excel client. The stored procedure will be fast and efficient because the code runs very close to the data and the code only has to be written once.
When setting up database users, granting permissions to call stored procedures can be difficult. There is no default way to grant execute access to all stored procedures to a user. There are some ways to script something to grant that permission for each stored procedure, but it must be re-run every time a new stored procedure is added and in many cases whenever a database is moved to a new server.
Triggers have notoriously bad performance and are generally reviled. That said, they can be very useful in specific situations. Triggers should not be implemented lightly and when they are used, great care should be taken to keep the triggers simple and efficient.
Deleting vs. Hiding Rows
In almost all situations, favor hiding rows by setting a flag in the row instead of deleting. Create a bit column IsHidden or IsInactive with a default value of false (0).
When foreign key relationships are used in a database, deleting rows can become quite difficult. In addition, there is no implicit audit trail in SQL Server when rows are deleted. If something shows up on a report one day and is just missing the next with no explanation, that can be a maintenance nightmare. Finally, tables can become very fragmented when there is a lot of deleting and inserting going on and performance can suffer.
Cascading Updates and Deletes
Use of cascading updates and deletes are left to the discretion of the team. They can sometimes make testing a lot easier.
CODE mainly uses Azure SQL Database for development and test environments. Each database should have an XX_DbAdmin user who is a db_owner and an XX_DbUser who is a db_datareader and db_datawriter. For example, CP_DbAdmin in the db_owner login for the CloudPark app. Apps should use the XX_DbUser login while the XX_DbAdmin login can be used for development and maintenance tasks.
A hole has to be punched in the firewall for the server (not the database), for every external IP address accessing it. We typically open this up for each developer’s work and home.
Development databases in Azure SQL Database should be named ProjectNameDev. For example, CloudParkDev. Test databases, normally available to our clients should be named without the Dev suffix. Production databases should be named the same as the test database, but will reside on a different server.
You must use SQL Server Management Studio 2014 WITH SP1 or higher to work with Azure SQL Database. You can also use the SQL Server Object explorer in Visual Studio.
For the most part, we use the “Basic” pricing tier for development and test environments, which provides 5 DTUs of performance. This generally works great for testing code, but can be painfully slow for maintaining the database structure and using other tools. We can temporarily bump up performance levels when these types of operations need to be done.
When a lot of intense work needs to be done, it sometimes easier to copy the database from Azure, work on it locally and later re-deploy the database. This can be done via “Import Data Tier Application…” in the SSMS tool.
CODE standardized on using Entity Framework for SQL Server database and Azure SQL Database access. As mentioned above, certain tasks such as regenerating or updating data models from the database can be excruciatingly slow at the “Basic” pricing tier. You may want to request a temporary boost in server performance when this becomes necessary.
Primary keys, foreign keys, relations must be defined properly and completely in the database for EF to work as expected.