Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Why are the module timestamps in Windows 10 so nonsensical?

One of the fields in the Portable Executable (PE) header is called
It's a 32-bit value representing the time the file was created,
in the form of seconds since January 1, 1970 UTC.
But starting in Windows 10, those timestamps are all nonsense.
If you look at the timestamps of various files, you'll see that
they appear to be random numbers, completely unrelated to any
What's going on?

One of the changes to the Windows engineering system begun in Windows 10
is the move toward reproducible builds.
This means that if you start with the exact same source code,
then you should finish with the exact same binary code.

There are lots of things that hamper reproducibility.
One source is the language itself.
For example,
anonymous namespaces may not have a programmatically-accessible name,
but since the objects within it have external linkage, they need to have
a name nonetheless,
and the name must be different for different source files.
How does it ensure the names are unique?
Does the compiler use a random number generator to generate these names?
Is it a hash of the file name?

Another source is the compiler's internal code generation algorithms.
For example, if a compiler chooses between two optimizations depending
how much RAM is available, or how powerful the processor is,
then that prevents the result from being reproducible because two
systems with different hardware configurations may end up producing
different outputs.
Or if the optimizer has a failsafe switch that abandons an operation
if the algorithm is still running after 500ms.
Or if the optimizer uses a

non-deterministic register allocation strategy.
Or if the compiler uses a deterministic algorithm
("sort all local variables") but uses a non-determinstic
criterion ("... by the heap address of the data structure we use
to keep track of each variable.").

There are also inputs to the system outside the compiler that
hamper reproducibility.
For example, the full path to the file being compiled will show
up in __FILE__ preprocessor directives,
which will cause problems when built from different machines
with different names for the root directory that holds the source code.
(Or even from the same machine with two copies of the source code.)
There may be files auto-generated by the build process that go into
the compiler (for example, the output of compiler-compilers);
those need to be deterministic too.

Timestamps are another source of non-determinism.
Even if all the inputs are identical,
the outputs will still be different because of the timestamps.

Okay, at least we can fix the issue with the file format.
Setting the timestamp to be a hash of the resulting binary
preserves reproducibility.

"Okay, but why not set the file timestamp to the
the timestamp of the source code
the binary was created from?
That way, it's still a timestamp at least."
That still breaks reproducibility, because that means that touching a file
without making any changes will result in a change in binary output.

Remember what the timestamp is used for:
It's used by the module loader to determine

whether bound imports should be trusted.
We've already seen cases where the timestamp is inaccurate.
For example, if you rebind a DLL, then the rebound DLL has the same
timestamp as the original,
rather than the timestamp of the rebind,
because you don't want to break the bindings of other DLLs that bound to
your DLL.

So the timestamp is already unreliable.

The timestamp is really a unique ID that tells the loader,
"The exports of this DLL have not changed since the last time
anybody bound to it."
And a hash is a reproducible unique ID.

Share the post

Why are the module timestamps in Windows 10 so nonsensical?


Subscribe to Msdn Blogs | Get The Latest Information, Insights, Announcements, And News From Microsoft Experts And Developers In The Msdn Blogs.

Get updates delivered right to your inbox!

Thank you for your subscription