|
|||||||||||||||||||||
|
|||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
IntroductionNot many Windows developers seem aware of it, but Microsoft deliberately designed Windows NT to report incorrect file creation, modification, and access times. This decision is documented in the Knowledge Base in articles Q128126 and Q158588. For most purposes, this behavior is innocuous, but as Microsoft writes in Q158588, After the automatic correction for Daylight Savings Time, monitoring programs comparing current time/date stamps to reference data that were not written using Win32 API calls which directly obtain/adjust to Universal Coordinated Time (UTC) will erroneously report time/date changes on files. Programs affected by this issue may include version-control software, database-synchronization software, software-distribution packages, backup software .... This behavior is responsible for a flood of questions to the various support lists for CVS, following the first Sunday in April and the last Sunday in October, with scores of people complaining that CVS now reports erroneously that their files have been modified. This is commonly known as the "red file bug" because the WinCVS shell uses red icons to indicate modified files. Over the past two years, several people have made concerted efforts to fix this bug and determine the correct file modification times for files both on NTFS and FAT volumes. It has proved surprisingly difficult to solve this problem correctly. I believe that I have finally gotten everything right and would like to share my solution with anyone else who cares about this issue. An Example of the ProblemThe batch file listed belowFor example, run the following batch file on a computer where C: is an NTFS volume and A: is a FAT-formatted floppy disk. You will need write access to C:\ and A:\. This script will change your system time and date, so be prepared to manually restore them afterwards. REM Test_DST_Bug.bat REM File Modification Time Test Date /T Time /T Date 10/27/2001 Time 10:00 AM Echo Foo > A:\Foo.txt Time 10:30 AM Echo Foo > C:\Bar.txt dir a:\Foo.txt c:\Bar.txt Date 10/28/2001 dir a:\Foo.txt c:\Bar.txt REM Prompt the user to reset the date and time. date time The result looks something like this (abridged to save space) C:\>Date 10/27/2001 C:\>dir a:\Foo.txt c:\Bar.txt Directory of a:\ 10/27/01 10:00a 6 foo.txt Directory of c:\ 10/27/01 10:30a 6 Bar.txt C:\>Date 10/28/2001 C:\>dir a:\Foo.txt c:\Bar.txt Directory of a:\ 10/27/01 10:00a 6 foo.txt Directory of c:\ 10/27/01 09:30a 6 Bar.txt On 27 October, Windows correctly reports that Bar.txt was modified half an hour after Foo.txt, but the next day, Windows has changed its mind and decided that actually, Bar.txt was modified half an hour before Foo.txt. A näive programmer might think this was a bug, but as Microsoft emphasized, this is how they want Windows to behave. Solution to the problemHaving spent a lot of time thinking about this problem, I wanted to share the information with people who want to get to the bottom of it, but I realize that most readers here will just want to grab my solutions and use them. Thus, I am putting the instructions here for using the solution. The library I supply contains one exported function: namespace jmg = Jonathan_M_Gilligan_95724E90_4A88_11d5_80F3_006008C7B14D; time_t mod_time_1, mod_time_2; if ( jmg::GetUTCFileModTime( _T("foo.txt"), & mod_time_1 ) && jmg::GetUTCFileModTime( _T("bar.txt"), & mod_time_2 ) ) { if (mod_time_1 > mod_time_2) { _tprintf( _T("foo is older.\n") ); } } Why Windows has this problemThe origin of this file-name problem lies in the early days of MS-DOS and PC-DOS. Unix and other operating systems designed for continuous use and network communications have long tended to store times in GMT (later UTC) format so computers in different time zones can accurately determine the order of different events. However, when Microsoft adapted DOS for the IBM PC, the personal computer was not envisioned in the context of wide-area networks, where it would be important to compare the modification times of files on the PC with those on another computer in another time zone. In the interest of efficiently using the very limited resources of the
computer, Microsoft wisely decided not to waste bits or processor cycles
worrying about time zones. To put this decision in context, recall that the
first two generations of PCs did not have battery-backed real-time clocks,
so you would generally put Digression on systems of measuring time...By the time of WinNT, wide-area networks and had become sufficiently common that Microsoft realized that the OS should measure time in some universal format that would allow different computers to compare the order (and separation) of events irrespective of their particular time zones. Although the details vary (different time structures measure time relative to different events), the net effect is that all times used internally in Win32 measure time with respect to UTC (what used to be called GMT). Having once worked down the hall from the master atomic clock array for the United States at the National Institute of Standards and Technology in Boulder, I feel obligated to add a few words about time and systems for reporting time. Long ago, we used to refer time to GMT, or Greenwich Mean Time, which was kept by the Royal Observatory in Greenwich, England and was ultimately referred to the position of the sun as measured by the observatory. When atomic clocks became the standard for timekeeping, a new standard, called UTC emerged. UTC is a bastard acronym. In English, it stands for "Coordinated Universal Time," while in French it stands for "le temps universel coordonné." Rather than using either CUT or TUC, the nonsense compromise acronym UTC was adopted. To understand UTC, we must first understand the more abstract International Atomic Time (TAI, le temps atomique international), which measures the number of seconds that have elapsed since approximately 1 Jan 1958, as measured by caesium atomic clocks. The second is defined to be the amount of time required for 9 192 631 770 cycles of the caesium hyperfine frequency. However, neither the day nor the year are exact multiples of this number, so we take TAI and correct it so that it corresponds to the actual motion of the earth by adding corrections such as "leap seconds." TAI measures raw atomic time. UTC measures time coordinated to the motion of the earth (i.e., so we don't end up having midnight while the sun is shining or January in midsummer). Details of what UTC really means, together with a more detailed history of timekeeping, can be found at http://ecco.bsee.swin.edu.au/chronos/GMT-explained.html. UTC, time zones, and Windows file timesSo what does this all have to do with file modification
times on Windows computers? Windows is stuck with some serious problems
integrating FAT and NTFS files compatibly. FAT records file modification times
with respect to the local time zone, while NTFS records file modification (as
well as creation and access times, which FAT does not record) in UTC. The first
question you may want to ask is, "How should Windows report these file
times?" Clearly it would be stupid for It is straightforward to translate UTC to local time. You look up the offset, in minutes, between the local time zone and UTC, determine whether daylight savings is in effect and add either the standard or the daylight offset to the UTC time. However, daylight time throws a subtle wrench in the works if we try to go backwards... The problem with daylight timeIf you want to translate a time in your local time zone into UTC, it seems a straightforward matter of determining whether daylight time is in effect locally and then subtracting either the standard or the daylight offset from the local time to arrive at UTC. A subtle problem emerges due to the fact that the mapping from UTC to local time is not one-to-one. Specifically, when we leave daylight savings time and set our clocks back, there are two distinct hour-long intervals of UTC time that map onto the same hour-long interval of local time. Consider the concrete case of 1:30 AM on the last Sunday in October. Let's suppose the local time zone is US Central Time (-6 hours offset from UTC when daylight time is not in effect, -5 hours when it is). At 06:00 UTC on Sunday 28 October 2001, the time in the US Central zone will be 01:00 (1:00 AM) and daylight time will be in effect. At 06:30 UTC, it will be 01:30 local. At 07:00 UTC, it will be 01:00:00 local and daylight time will not be in effect. At 07:30 UTC, it will be 01:30 local. Thus, for all times 01:00 ≤ t < 02:00 local, there will be two distinct UTC times that correspond to the given local time. This degenerate mapping means that we can't be sure which UTC time corresponds to 01:30 local time. If a FAT file is marked as having been modified at 01:30 on 28 Oct. 2001, we can't determine the UTC time. When translating local file times to UTC and vice-versa, Microsoft made a strange decision.
We would like to have the following code procduce FILETIME in_time, local_time, out_time; // assign in_time, then do this... FileTimeToLocalFileTime(& in_time, & local_time); LocalFileTimeToFileTime(& local_time, & out_time); The problem is that if the local time zone is US Central (UTC - 6 hours for standard time,
UTC - 5 hours for daylight time) then It may seem strange that this would affect the C library call // pseudocode listing WIN32_FIND_DATA find_buf; HANDLE hFile; FILETIME local_ft; time_t mod_time; // FindFirstFile returns times in UTC. // // For NTFS files, it just returns the modification time // stored on the disk. // // For FAT files, it converts the modification time from // local (which is stored on the disk) to UTC using // LocalFileTimeToFileTime() // hFile = FindFirstFile ( file_name, &find_buf ); // convert UTC mod time to local... FileTimeToLocalFileTime ( &find_buf.ftLastWriteTime, &local_ft ); // Now use a private, undocumented function to convert local time to UTC // time according to the DST settings appropriate to the time being // converted! mod_time = __secret_microsoft_converter(local_ft); For a FAT file, the conversions work like this:
For an NTFS file, the conversions work like this:
This explains the problem I showed at the top of this article: The time
reported by Categorizing the problemThere are 3 possible ways I can think of where this inconsistency in reporting file times may cause problems:
Solutions:
| ||||||||||||||||||||