While this isn’t as big as a Microsoft vs. Linux debate, and who doesn’t like those, this age old question can definitely get the blood pumping. As with any technology, each method has its benefits and costs.
A binary large object (or BLOB) is more of a method than a solution where binary data is stored directly in a database. Storing a BLOB in a SQL Server means first identifying how the binary data is formatted; Word documents, PDF, images, XML. You can write a BLOB to a database as either binary or character data, depending on the type of the field in your data source. It is recommended that you use the newer varchar(MAX), nvarchar(MAX), and varbinary(MAX) data types instead of Text, NText, and Image data types as these are slated to be removed in a future version of Microsoft SQL Server.
The other option you have is storing the files to a file system. The file itself is saved to the local server or available file server. The location, name, and other file metadata is then saved and referenced in the database.
So which one to use? Let’s take a look at the pros and cons for each. Interestingly, and probably not surprisingly, the pros of each are the con of the other.
- Since the file resides in the database, backups are easier.
- Inserting of the metadata and files in the same table allows for
- Transactional integrity with inserting the metadata and content. Most file systems aren’t transactional.
- Table locks.
- Performance. This needs a little explanation. While data types like varbinary allow for 2TB of data to be stored in the database, access and storage of these files is an issue. Video and other large files are actually stored in RAM before processing the data.
- Database size can increase dramatically.
- Access to the files is available through directory browsing.
- File servers can be scaled easily.
- Use of Storage Area Networks (SANs) can be used.
- Files can be deleted causing database integrity to be compromised.
- Backups are a little more difficult as both the database and file system must be maintained.
- Maybe…If the file system is over a slow network, access to the files can be slow.
What does this mean? While both have their pros and cons. The cons of the File System solution are easier to overcome. Security can be set up on the file server to prevent deletion of files and backups of both database and file system can be seen as a pro. If you set up your back up to do incremental backups, then the backups are actually smaller and can be done in less time.
While the prospect of having the file saved in the database and integrity ensured, the performance issues the BLOB method presents cannot be diminished. Since the file must be first streamed to an object to be then displayed or accessed by the client, file size does become an issue.
It must be noted that SQL Server 2008 has attempted to provide the best of both worlds by introducing FILESTREAM. From MSDN:
“Much data is unstructured, such as text documents, images, and videos. This unstructured data is often stored outside the database, separate from its structured data. This separation can cause data management complexities. Or, if the data is associated with structured storage, the file streaming capabilities and performance can be limited.
FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.”
Microsoft has identified the issues presented above. While this solution is a step up from BLOB, it is still untested for large file systems.
So let the arguments commence.