Save File. You’ve seen that command a thousand times as you work on your documents and images, but have you ever thought about what lies underneath? And how critical a file system is to everything you do on your computing platform? With Windows Server 2012, Microsoft is updating their ubiquitous and venerable NTFS with ReFS, which is one of the more important changes you’ll find in their latest server platform.
Let’s begin our discussion of ReFS by developing a working definition of a file system and its purpose. Generally speaking, file systems are software-defined and used for the storage and organization of data on any given hardware storage device and hosted virtual storage. As applications and processes execute and run on a computer, there is a required mechanism that enables the process-based procedures (software inputs) to store and logically organize any data creation, change, or what we refer to as output. Using this simplified description as a model with any operating system, a file system is an input\output kernel-mode “service” that communicates with that particular operating system’s file system drivers.
Throughout the history of Microsoft Operating Systems, Systems Administrators and users have had at their disposal, chronologically speaking, the FAT, FAT16, FAT32, exFAT, and NTFS file systems to choose from. As the operating systems evolved from DOS-based to NT Kernel-based, the file systems supported evolved as well.
For many reasons, including stability, scalability, and security, NTFS has been the file system of choice for Server Administrators. With the release of Server 2012, Microsoft is taking a big step forward in evolving and updating NTFS with the Resilient File System (ReFS). ReFS was created and designed to work in tandem with Storage Spaces. Deploying both combats data loss due to those nasty disk subsystem problems such as “bit rot” and “lost writes”, which we’ll discuss later. This is accomplished by leveraging the key and necessary features of NTFS and extending the File System capabilities. One key feature of ReFS is that it was designed to support very large volume sizes.
These large volumes are limited to internal storage and not removable storage. Some key new ReFS capabilities included are:
- Data Storage Integrity through Data Verification and Auto Correction of the File System where error identification can take place before the error actually occurs in a “verify and auto-correct” process.
- Maintaining the high availability of Server-based data.
- Storage Spaces to reduce the cost of supporting and maintaining the high availability of physically hosted and virtual machine-based data storage. This includes support for very large volumes.
The Use of Checksumming in ReFS
Data Storage Integrity through Data Verification and Auto Correction of the File System where error identification can take place before the error actually occurs in a “verify and auto-correct” process.
ReFS, as a file system available in Windows Server 2012, enables IT professionals to breathe a little easier knowing that data storage is more dependable through the implementation of the B+ tree file system structure used in ReFS. As such, ReFS data storage protects data using 64-bit checksums, or hashes, as data is written to a storage location (file, folder, volume). This built-in mechanism maintains resiliency and data integrity. At the core of this approach, all ReFS metadata is check-summed.
In IT, terms such as metadata and checksum are used often enough and in different enough circumstances to be a bit confusing, so before we move on with our discussion let’s create a working definition of each term as they apply to dependable file storage and ReFS.
The term metadata has been defined as “data about data,” which is rather ambiguous. To simplify, think of any code or software-defined object as having descriptive attributes or fields. In fact, an object is constructed as a collection of attributes or fields. So the question is: “Where do these attributes or fields come from; what is their derivative form?” Well, a file attribute, for example, is constructed from its metadata.
In a file system, metadata represents data with defining information about, for example, a file object. That data can include date and time of file creation, date and time of file modification(s), the account or security principal that created the file, file location, and so on. All of this data and information is crucial to the correctness and integrity of the file object (meaning the lack of corruption).
A checksum is a computation of the number of bits used in a storage unit of a file to insure the file’s integrity. It is a calculation that “checks” the validity of the data as stored or as being stored. Checksums use various algorithms to verify data. With ReFS, when writing data to a disk, a 64-bit checksum is calculated and stored independently of the file itself. ReFS performs check-summing at the file and metadata level. In addition, since the result of the checksum is stored in a different location than the file itself, the file system is able to determine disk corruption, including lost writes and any inputs written to the wrong location. Storing the checksum in a different location on the disk (vs. in the file header) prevents the simultaneous corruption of both. It also helps in the detection of the dreaded “bit rot”, where over time storage media will just plain wear-out (lose magnetic orientation) and result in input\output errors. It is readily apparent that this technique is a key factor in keeping data available and “online.”