Not many Windows developers seem aware of it, but Microsoft deliberately designed Windows NT to report incorrect file creation,
modification, and access times. This decision is documented in the Knowledge Base in articles Q128126 and Q158588. For most purposes,
this behavior is innocuous, but as Microsoft writes in Q158588,
After the automatic correction for Daylight Savings Time, monitoring programs comparing current time/date stamps to reference data that
were not written using Win32 API calls which directly obtain/adjust to Universal Coordinated Time (UTC) will erroneously report time/date
changes on files. Programs affected by this issue may include version-control software, database-synchronization software,
software-distribution packages, backup software ....
This behavior is responsible for a flood of questions to the various support lists for CVS, following the first Sunday in April
and the last Sunday in October, with scores of people complaining that CVS now reports erroneously that their files have been modified.
This is commonly known as the "red file bug" because the WinCVS shell uses red icons to indicate modified files.
Over the past two years, several people have made concerted efforts to fix this bug and determine the correct file modification times for
files both on NTFS and FAT volumes. It has proved surprisingly difficult to solve this problem correctly. I believe that I have finally
gotten everything right and would like to share my solution with anyone else who cares about this issue.
An Example of the Problem
The batch file listed belowFor example, run the following batch file on a computer where C: is an NTFS volume and A: is a FAT-formatted floppy disk.
You will need write access to C:\ and A:\. This script will change your system time and date, so be prepared to manually
restore them afterwards.
REM Test_DST_Bug.bat
REM File Modification Time Test
Date /T
Time /T
Date 10/27/2001
Time 10:00 AM
Echo Foo > A:\Foo.txt
Time 10:30 AM
Echo Foo > C:\Bar.txt
dir a:\Foo.txt c:\Bar.txt
Date 10/28/2001
dir a:\Foo.txt c:\Bar.txt
REM Prompt the user to reset the date and time.
date
time
The result looks something like this (abridged to save space)
C:\>Date 10/27/2001
C:\>dir a:\Foo.txt c:\Bar.txt
Directory of a:\
10/27/01 10:00a 6 foo.txt
Directory of c:\
10/27/01 <font color=red>10:30a</font> 6 Bar.txt
C:\>Date 10/28/2001
C:\>dir a:\Foo.txt c:\Bar.txt
Directory of a:\
10/27/01 10:00a 6 foo.txt
Directory of c:\
10/27/01 <font color=red>09:30a</font> 6 Bar.txt
On 27 October, Windows correctly reports that Bar.txt was modified half an hour after Foo.txt, but
the next day, Windows has changed its mind and decided that actually, Bar.txt was modified half an hour before
Foo.txt. A näive programmer might think this was a bug, but as Microsoft emphasized, this is how they want
Windows to behave.
Solution to the problem
Having spent a lot of time thinking about this problem, I wanted to share the information with people who want to get to the bottom of it,
but I realize that most readers here will just want to grab my solutions and use them. Thus, I am putting the instructions here for using the
solution.
The library I supply contains one exported function: BOOL GetUTCFileModTime ( LPCTSTR name, time_t * utc_mod_time ). Just pass this function a filename
(it can be a fully qualified path or just a file name in the current directory). The function will return TRUE for success, FALSE for failure, and will store
the UTC file modification time in * utc_mod_time. Link with the JmgStat.lib library and you're off and running.
Oh, yes. I have wrapped the library in the namespace Jonathan_M_Gilligan_95724E90_4A88_11d5_80F3_006008C7B14D.
Someone else might have a function called GetUTCFileModTime() and I don't want to collide with it, so I have concatenated my initials with a GUID to
produce a unique, but recognizable namespace. Rather than typeing such a long string for every invocation of the function, you may want to assign a namespace alias,
namespace jmg = Jonathan_M_Gilligan_95724E90_4A88_11d5_80F3_006008C7B14D; and then you can call jmg::GetUTCFileModTime();
Example:
namespace jmg = Jonathan_M_Gilligan_95724E90_4A88_11d5_80F3_006008C7B14D;
time_t mod_time_1, mod_time_2;
if ( jmg::GetUTCFileModTime( _T("foo.txt"), & mod_time_1 )
&& jmg::GetUTCFileModTime( _T("bar.txt"), & mod_time_2 ) )
{
if (mod_time_1 > mod_time_2)
{
_tprintf( _T("foo is older.\n") );
}
}
Why Windows has this problem
The origin of this file-name problem lies in the early
days of MS-DOS and PC-DOS. Unix and other operating systems designed for
continuous use and network communications have long tended to store times in GMT
(later UTC) format so computers in different time zones can accurately determine
the order of different events. However, when Microsoft adapted DOS for the IBM PC, the
personal computer was not envisioned in the context of wide-area networks, where
it would be important to compare the modification times of files on the PC with
those on another computer in another time zone.
In the interest of efficiently using the very limited resources of the
computer, Microsoft wisely decided not to waste bits or processor cycles
worrying about time zones. To put this decision in context, recall that the
first two generations of PCs did not have battery-backed real-time clocks,
so you would generally put DATE and TIME commands
into your AUTOEXEC.BAT file to prompt you to enter the date and
time manually when the computer booted.
Digression on systems of measuring time...
By the time of WinNT, wide-area networks and had become
sufficiently common that Microsoft realized that the OS should
measure time in some universal format that would allow different
computers to compare the order (and separation) of events irrespective
of their particular time zones. Although the details vary (different time
structures measure time relative to different events), the net effect is that
all times used internally in Win32 measure time with respect to UTC (what used
to be called GMT).
Having once worked down the hall from the master atomic
clock array for the United States at the National Institute of Standards and
Technology in Boulder, I feel obligated to add a few words about time and
systems for reporting time. Long ago, we used to refer time to GMT, or Greenwich
Mean Time, which was kept by the Royal Observatory in Greenwich, England and was
ultimately referred to the position of the sun as measured by the observatory.
When atomic clocks became the standard for timekeeping, a new standard, called
UTC emerged. UTC is a bastard acronym. In English, it stands for "Coordinated
Universal Time," while in French it stands for "le temps universel coordonné."
Rather than using either CUT or TUC, the nonsense compromise acronym UTC was
adopted.
To understand UTC, we must first understand the more abstract International
Atomic Time (TAI, le temps atomique international), which measures the
number of seconds that have elapsed since approximately 1 Jan 1958, as measured
by caesium atomic clocks. The second is defined to be the amount of time
required for 9 192 631 770 cycles of the caesium hyperfine
frequency. However, neither the day nor the year are exact multiples of this
number, so we take TAI and correct it so that it corresponds to the actual
motion of the earth by adding corrections such as "leap seconds."
TAI measures raw atomic time. UTC measures time coordinated to the
motion of the earth (i.e., so we don't end up having midnight while the sun
is shining or January in midsummer). Details of what UTC
really means, together with a more detailed history of timekeeping, can be found at http://ecco.bsee.swin.edu.au/chronos/GMT-explained.html.
UTC, time zones, and Windows file times
So what does this all have to do with file modification
times on Windows computers? Windows is stuck with some serious problems
integrating FAT and NTFS files compatibly. FAT records file modification times
with respect to the local time zone, while NTFS records file modification (as
well as creation and access times, which FAT does not record) in UTC. The first
question you may want to ask is, "How should Windows report these file
times?" Clearly it would be stupid for dir
and Windows Explorer to report FAT file times in the local time zone and
NTFS file times in UTC. If inconsistent formats were used, users would have
great difficulty determining which of two files was more recent. We
must thus choose to translate one of the two file time formats
when we report to the user. Most users are likely to want to know the file
modification time in their local time zone. This keeps things consistent with
what people learned to expect under DOS and Win16. It also is more useful to
most users, who may want to know how long ago they modified a file without
looking up the offset of their local time zone from UTC.
It is straightforward to translate UTC to local time.
You look up the offset, in minutes, between the local time zone and UTC,
determine whether daylight savings is in effect and add either the standard or
the daylight offset to the UTC time. However, daylight time throws a subtle
wrench in the works if we try to go backwards...
The problem with daylight time
If you want to translate a
time in your local time zone into UTC, it seems a straightforward
matter of determining whether daylight time is in effect locally and then subtracting
either the standard or the daylight offset from the local time to arrive at UTC.
A subtle problem emerges due to the fact that the mapping from UTC to local time is
not one-to-one. Specifically, when we leave daylight savings time and set our clocks
back, there are two distinct hour-long intervals of UTC time that map onto the same
hour-long interval of local time.
Consider the concrete case of 1:30
AM on the last Sunday in October. Let's suppose the local time zone is
US Central Time (-6 hours offset from UTC when daylight time is not in effect, -5
hours when it is). At 06:00 UTC on Sunday 28 October 2001, the time in the
US Central zone will be 01:00 (1:00 AM) and daylight time will be in effect.
At 06:30 UTC, it will be 01:30 local. At 07:00 UTC, it will
be 01:00:00 local and daylight time will not be in effect. At 07:30 UTC, it
will be 01:30 local. Thus, for all times 01:00 ≤ t < 02:00 local, there will
be two distinct UTC times that correspond to the given local time. This
degenerate mapping means that we can't be sure which UTC time corresponds to
01:30 local time. If a FAT file is marked as having been modified at 01:30
on 28 Oct. 2001, we can't determine the UTC time.
When translating local file times to UTC and vice-versa, Microsoft made a strange decision.
We would like to have the following code procduce out_time equal
to in_time
FILETIME in_time, local_time, out_time;
FileTimeToLocalFileTime(& in_time, & local_time);
LocalFileTimeToFileTime(& local_time, & out_time);
The problem is that if the local time zone is US Central (UTC - 6 hours for standard time,
UTC - 5 hours for daylight time) then in_time = 06:30:00 Oct 28 2001
and in_time = 07:30:00 Oct 28 2001 both map onto the same local time, 01:30:00
Oct 28 2001 and we don't know which branch to choose when we execute
LocalFileTimeToFileTime().
Microsoft picked an incorrect, but unambiguously invertable algorithm: move all times up an hour when
daylight time is in effect on the local computer, irrespective of the DST state of the time
being converted. Thus, if DST is in effect on my local computer, FileTimeToLocalFileTime
converts 06:30:00 Oct 28 2001 UTC to 01:30:00 CDT and 07:30:00 Oct 28 2001 UTC to 02:30:00 CDT. If
I call the same function with the same arguments, but when DST is not in effect on my local computer,
FileTimeToLocalFileTime will convert 06:30:00 UTC to 00:30:00 CDT and 07:30:00 UTC to
01:30:00 CDT.
It may seem strange that this would affect the C library call stat, which allegedly
returns the UTC modification time of a file. If you examine the source code for Microsoft's C library,
you find that it gets the modification time thus:
WIN32_FIND_DATA find_buf;
HANDLE hFile;
FILETIME local_ft;
time_t mod_time;
hFile = FindFirstFile ( file_name, &find_buf );
FileTimeToLocalFileTime ( &find_buf.ftLastWriteTime, &local_ft );
mod_time = __secret_microsoft_converter(local_ft);
For a FAT file, the conversions work like this:
- raw file modification is converted to UTC by
LocalFileTimeToFileTime()
- UTC converted back to local by
FileTimeToLocalFileTime(). Note that this
exactly reverses the effect of step 1, so we are left with the correct local modification time.
- local time converted to "correct" UTC by private function
For an NTFS file, the conversions work like this:
- raw file modification is already in UTC, so we don't need to convert it.
- UTC converted to local by
FileTimeToLocalFileTime(). This applies a DST correction
according to the DST setting of the computer's system time, irrespective of the DST setting at the
file modification time.
- local time converted to "correct" UTC by private function. Note that this does not
reverse the effect of step 2 because in step 3, we use the DST setting for the file modification time,
not the system time.
This explains the problem I showed at the top of this article: The time
reported by dir for a file on an NTFS volume changes by an hour
as we move into or out of daylight savings
time, despite the fact that I haven't touched the file. FAT modification times
are stable across DST.
Categorizing the problem
There are 3 possible ways I can think of where this inconsistency in
reporting file times may cause problems:
- You may be comparing a file on an NTFS volume with a
time_t value stored in a file (or memory). This is frequently seen in CVS and
leads to the infamous "red-file" problem on the first Sunday of April and the
last Sunday of October.
- You may be comparing a file on a FAT volume with a
time_t value.
- You may be comparing a file on a FAT volume with a file on an NTFS
volume.
Solutions:
- For case (1), it's simple. Get the file times using the Windows API call
GetFileTime()
instead of using the C library stat(), and convert the FILETIME
to time_t by subtracting the origin (Jan 1 1600) and dividing by 10,000,000 to
convert 100-nanosecond units to seconds.
For case (2), stat()
will work and return a time_t that you can compare to the stored one. If you must use GetFileTime() do not
use LocalFileTimeToFileTime(). This function will apply the the daylight state of the
current system time, not the daylight status of the file time in the argument. Fortunately, the C library
mktime() function will correctly convert the time if you correctly set the tm_isdst
field of the tm struct.
There is a bit of a chicken-and-egg problem here. Windows does not supply a good API call to let
you determine whether DST was in effect at a given time. Fortunately for residents of the US and other countries
that use the same logic (Daylight time starts at 2:00 AM on the first Sunday of April and ends at 2:00 AM on
the last Sunday in October), in which case you can set tm_isdst to a negative number and
mktime() will automatically
determine whether daylight time applies or not. If the file was modified in
the window 1:00-2:00 am on the last Sunday in October, it's ambiguous how
mktime() computes the modification time.
People in time zones that do not follow the usual US daylight rule must brute-force the daylight time
problem by retrieving the applicable TIMEZONEINFO structure with
GetTimeZoneInformation and manually calculating whether daylight time applies.
- For case (3), the best bet is to follow the
instructions for case 2 above and compare the resultant UTC
time_t with the time
for the NTFS file determined in case (1).
The library (available for download from the link at the
top of this article implements this solution
with checking for the filesystem the file is stored under.