Not many Windows developers seem aware of it, but Microsoft deliberately designed Windows NT to report incorrect file creation, modification, and access times. This decision is documented in the Knowledge Base in articles Q128126 and Q158588. For most purposes, this behavior is innocuous, but as Microsoft writes in Q158588,
After the automatic correction for Daylight Savings Time, monitoring programs comparing current time/date stamps to reference data that were not written using Win32 API calls which directly obtain/adjust to Universal Coordinated Time (UTC) will erroneously report time/date changes on files. Programs affected by this issue may include version-control software, database-synchronization software, software-distribution packages, backup software ....
This behavior is responsible for a flood of questions to the various support lists for CVS, following the first Sunday in April and the last Sunday in October, with scores of people complaining that CVS now reports erroneously that their files have been modified. This is commonly known as the "red file bug" because the WinCVS shell uses red icons to indicate modified files.
Over the past two years, several people have made concerted efforts to fix this bug and determine the correct file modification times for files both on NTFS and FAT volumes. It has proved surprisingly difficult to solve this problem correctly. I believe that I have finally gotten everything right and would like to share my solution with anyone else who cares about this issue.
An Example of the Problem
The batch file listed below for example, runs the following batch file on a computer where C: is an NTFS volume and A: is a FAT-formatted floppy disk. You will need write access to C:\ and A:\. This script will change your system time and date, so be prepared to manually restore them afterwards.
REM Test_DST_Bug.bat
REM File Modification Time Test
Date /T
Time /T
Date 10/27/2001
Time 10:00 AM
Echo Foo > A:\Foo.txt
Time 10:30 AM
Echo Foo > C:\Bar.txt
dir a:\Foo.txt c:\Bar.txt
Date 10/28/2001
dir a:\Foo.txt c:\Bar.txt
REM Prompt the user to reset the date and time.
date
time
The result looks something like this (abridged to save space):
C:\>Date 10/27/2001
C:\>dir a:\Foo.txt c:\Bar.txt
Directory of a:\
10/27/01 10:00a 6 foo.txt
Directory of c:\
10/27/01 <font color="red">10:30a</font> 6 Bar.txt
C:\>Date 10/28/2001
C:\>dir a:\Foo.txt c:\Bar.txt
Directory of a:\
10/27/01 10:00a 6 foo.txt
Directory of c:\
10/27/01 <font color="red">09:30a</font> 6 Bar.txt
On 27 October, Windows correctly reported that Bar.txt was modified half an hour after Foo.txt, but the next day, Windows changed its mind and decided that actually, Bar.txt was modified half an hour before Foo.txt. A näive programmer might think this was a bug, but as Microsoft emphasized, this is how they want Windows to behave.
Solution to the Problem
Having spent a lot of time thinking about this problem, I wanted to share the information with people who want to get to the bottom of it, but I realize that most readers here will just want to grab my solutions and use them. Thus, I am putting the instructions here for using the solution.
The library I supply contains one exported function: BOOL GetUTCFileModTime ( LPCTSTR name, time_t * utc_mod_time )
. Just pass this function a filename (it can be a fully qualified path or just a file name in the current directory). The function will return TRUE
for success, FALSE
for failure, and will store the UTC file modification time in * utc_mod_time
. Link with the JmgStat.lib library and you're off and running. Oh, yes. I have wrapped the library in the namespace Jonathan_M_Gilligan_95724E90_4A88_11d5_80F3_006008C7B14D
. Someone else might have a function called GetUTCFileModTime()
and I don't want to collide with it, so I have concatenated my initials with a GUID to produce a unique, but recognizable namespace. Rather than typing such a long string
for every invocation of the function, you may want to assign a namespace alias, namespace jmg = Jonathan_M_Gilligan_95724E90_4A88_11d5_80F3_006008C7B14D;
and then you can call jmg::GetUTCFileModTime();
Example:
namespace jmg = Jonathan_M_Gilligan_95724E90_4A88_11d5_80F3_006008C7B14D;
time_t mod_time_1, mod_time_2;
if ( jmg::GetUTCFileModTime( _T("foo.txt"), & mod_time_1 )
&& jmg::GetUTCFileModTime( _T("bar.txt"), & mod_time_2 ) )
{
if (mod_time_1 > mod_time_2)
{
_tprintf( _T("foo is older.\n") );
}
}
Why Windows has this Problem
The origin of this file-name problem lies in the early days of MS-DOS and PC-DOS. Unix and other operating systems designed for continuous use and network communications have long tended to store times in GMT (later UTC) format so computers in different time zones can accurately determine the order of different events. However, when Microsoft adapted DOS for the IBM PC, the personal computer was not envisioned in the context of wide-area networks, where it would be important to compare the modification times of files on the PC with those on another computer in another time zone.
In the interest of efficiently using the very limited resources of the computer, Microsoft wisely decided not to waste bits or processor cycles worrying about time zones. To put this decision in context, recall that the first two generations of PCs did not have battery-backed real-time clocks, so you would generally put DATE
and TIME
commands into your AUTOEXEC.BAT file to prompt you to enter the date and time manually when the computer booted.
Digression on Systems of Measuring Time...
By the time of WinNT, wide-area networks and had become sufficiently common that Microsoft realized that the OS should measure time in some universal format that would allow different computers to compare the order (and separation) of events irrespective of their particular time zones. Although the details vary (different time structures measure time relative to different events), the net effect is that all times used internally in Win32 measure time with respect to UTC (what used to be called GMT).
Having once worked down the hall from the master atomic clock array for the United States at the National Institute of Standards and Technology in Boulder, I feel obligated to add a few words about time and systems for reporting time. Long ago, we used to refer time to GMT, or Greenwich Mean Time, which was kept by the Royal Observatory in Greenwich, England and was ultimately referred to the position of the sun as measured by the observatory. When atomic clocks became the standard for timekeeping, a new standard, called UTC emerged. UTC is a bastard acronym. In English, it stands for "Coordinated Universal Time," while in French it stands for "le temps universel coordonné." Rather than using either CUT or TUC, the nonsense compromise acronym UTC was adopted.
To understand UTC, we must first understand the more abstract International Atomic Time (TAI, le temps atomique international), which measures the number of seconds that have elapsed since approximately 1 Jan 1958, as measured by caesium atomic clocks. The second is defined to be the amount of time required for 9 192 631 770 cycles of the caesium hyperfine frequency. However, neither the day nor the year are exact multiples of this number, so we take TAI and correct it so that it corresponds to the actual motion of the earth by adding corrections such as "leap seconds." TAI measures raw atomic time. UTC measures time coordinated to the motion of the earth (i.e., so we don't end up having midnight while the sun is shining or January in midsummer). Details of what UTC really means, together with a more detailed history of timekeeping, can be found at here.
UTC, Time Zones, and Windows File Times
So what does this all have to do with file modification times on Windows computers? Windows is stuck with some serious problems integrating FAT and NTFS files compatibly. FAT records file modification times with respect to the local time zone, while NTFS records file modification (as well as creation and access times, which FAT does not record) in UTC. The first question you may want to ask is, "How should Windows report these file times?" Clearly, it would be stupid for dir
and Windows Explorer to report FAT file times in the local time zone and NTFS file times in UTC. If inconsistent formats were used, users would have great difficulty determining which of two files was more recent. We must thus choose to translate one of the two file time formats when we report to the user. Most users are likely to want to know the file modification time in their local time zone. This keeps things consistent with what people learned to expect under DOS and Win16. It also is more useful to most users, who may want to know how long ago they modified a file without looking up the offset of their local time zone from UTC.
It is straightforward to translate UTC to local time. You look up the offset, in minutes, between the local time zone and UTC, determine whether daylight savings is in effect and add either the standard or the daylight offset to the UTC time. However, daylight time throws a subtle wrench in the works if we try to go backwards...
The Problem with Daylight Time
If you want to translate a time in your local time zone into UTC, it seems a straightforward matter of determining whether daylight time is in effect locally and then subtracting either the standard or the daylight offset from the local time to arrive at UTC. A subtle problem emerges due to the fact that the mapping from UTC to local time is not one-to-one. Specifically, when we leave daylight savings time and set our clocks back, there are two distinct hour-long intervals of UTC time that map onto the same hour-long interval of local time. Consider the concrete case of 1:30 AM on the last Sunday in October. Let's suppose the local time zone is US Central Time (-6 hours offset from UTC when daylight time is not in effect, -5 hours when it is). At 06:00 UTC on Sunday 28 October 2001, the time in the US Central zone will be 01:00 (1:00 AM) and daylight time will be in effect. At 06:30 UTC, it will be 01:30 local. At 07:00 UTC, it will be 01:00:00 local and daylight time will not be in effect. At 07:30 UTC, it will be 01:30 local. Thus, for all times 01:00 ≤ t < 02:00 local, there will be two distinct UTC times that correspond to the given local time. This degenerate mapping means that we can't be sure which UTC time corresponds to 01:30 local time. If a FAT file is marked as having been modified at 01:30 on 28 Oct. 2001, we can't determine the UTC time.
When translating local file times to UTC and vice-versa, Microsoft made a strange decision. We would like to have the following code produce out_time
equal to in_time
.
FILETIME in_time, local_time, out_time;
FileTimeToLocalFileTime(& in_time, & local_time);
LocalFileTimeToFileTime(& local_time, & out_time);
The problem is that if the local time zone is US Central (UTC - 6 hours for standard time, UTC - 5 hours for daylight time) then in_time
= 06:30:00 Oct 28 2001 and in_time
= 07:30:00 Oct 28 2001 both map onto the same local time, 01:30:00 Oct 28 2001 and we don't know which branch to choose when we execute LocalFileTimeToFileTime()
. Microsoft picked an incorrect, but unambiguously invertible algorithm: move all times up an hour when daylight time is in effect on the local computer, irrespective of the DST state of the time being converted. Thus, if DST is in effect on my local computer, FileTimeToLocalFileTime
converts 06:30:00 Oct 28 2001 UTC to 01:30:00 CDT and 07:30:00 Oct 28 2001 UTC to 02:30:00 CDT. If I call the same function with the same arguments, but when DST is not in effect on my local computer, FileTimeToLocalFileTime
will convert 06:30:00 UTC to 00:30:00 CDT and 07:30:00 UTC to 01:30:00 CDT.
It may seem strange that this would affect the C library call stat
, which allegedly returns the UTC modification time of a file. If you examine the source code for Microsoft's C library, you find that it gets the modification time thus:
WIN32_FIND_DATA find_buf;
HANDLE hFile;
FILETIME local_ft;
time_t mod_time;
hFile = FindFirstFile ( file_name, &find_buf );
FileTimeToLocalFileTime ( &find_buf.ftLastWriteTime, &local_ft );
mod_time = __secret_microsoft_converter(local_ft);
For a FAT file, the conversions work like this:
- Raw file modification is converted to UTC by
LocalFileTimeToFileTime()
- UTC converted back to local by
FileTimeToLocalFileTime()
. Note that this exactly reverses the effect of step 1, so we are left with the correct local modification time. - Local time converted to "correct" UTC by private function
For an NTFS file, the conversions work like this:
- Raw file modification is already in UTC, so we don't need to convert it.
- UTC converted to local by
FileTimeToLocalFileTime()
. This applies a DST correction according to the DST setting of the computer's system time, irrespective of the DST setting at the file modification time. - Local time converted to "correct" UTC by private function. Note that this does not reverse the effect of step 2 because in step 3, we use the DST setting for the file modification time, not the system time.
This explains the problem I showed at the top of this article: The time reported by dir
for a file on an NTFS volume changes by an hour as we move into or out of daylight savings time, despite the fact that I haven't touched the file. FAT modification times are stable across DST.
Categorizing the problem
There are 3 possible ways I can think of where this inconsistency in reporting file times may cause problems:
- You may be comparing a file on an NTFS volume with a
time_t
value stored in a file (or memory). This is frequently seen in CVS and leads to the infamous "red-file" problem on the first Sunday of April and the last Sunday of October. - You may be comparing a file on a FAT volume with a
time_t
value. - You may be comparing a file on a FAT volume with a file on an NTFS volume.
Solutions
- For case (1), it's simple. Get the file times using the Windows API call
GetFileTime()
instead of using the C library stat()
, and convert the FILETIME
to time_t
by subtracting the origin (Jan 1 1600) and dividing by 10,000,000 to convert 100-nanosecond units to seconds. -
For case (2), stat()
will work and return a time_t
that you can compare to the stored one. If you must use GetFileTime()
, do not use LocalFileTimeToFileTime()
. This function will apply the daylight state of the current system time, not the daylight status of the file time in the argument. Fortunately, the C library mktime()
function will correctly convert the time if you correctly set the tm_isdst
field of the tm
struct.
There is a bit of a chicken-and-egg problem here. Windows does not supply a good API call to let you determine whether DST was in effect at a given time. Fortunately, for residents of the US and other countries that use the same logic (Daylight time starts at 2:00 AM on the first Sunday of April and ends at 2:00 AM on the last Sunday in October), in which case you can set tm_isdst
to a negative number and mktime()
will automatically determine whether daylight time applies or not. If the file was modified in the window 1:00-2:00 am on the last Sunday in October, it's ambiguous how mktime()
computes the modification time.
People in time zones that do not follow the usual US daylight rule must brute-force the daylight time problem by retrieving the applicable TIMEZONEINFO
structure with GetTimeZoneInformation
and manually calculating whether daylight time applies.
- For case (3), the best bet is to follow the instructions for case 2 above and compare the resultant UTC
time_t
with the time for the NTFS file determined in case (1).
The library (available for download from the link at the top of this article implements this solution with checking for the filesystem the file is stored under.
License
This article has no explicit license attached to it, but may contain usage terms in the article text or the download files themselves. If in doubt, please contact the author via the discussion board below.
A list of licenses authors might use can be found here.
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.