Windows shell reads XML processing instructions

Tuesday 28 June 2005

I was looking at a file with an extension of .xml in the Windows explorer today. It had an unfamiliar icon associated with it. I tried dragging it onto Internet Explorer to see the XML, and was asked if I wanted to open it. I answered yes, and Office Infopath opened! "Since when does Infopath own .xml files?" I wondered. I'd never used Infopath before, and hadn't recently installed it or anything.

Looking in the XML file, I saw this processing instruction:

<?mso-applicationĀ progid="InfoPath.Document"?>

Hmmm, this seems to be an application linkage for the file! Sure enough, if I edited the progid to make it bogus, the icon in the explorer changed to a generic XML file icon, and I could now drop the file on IE and have it show me the XML.

I'd never heard of this before, but seems useful, and necessary. If XML is going to be a universal data solvent, then we'll need some way other than file extensions to determine how to launch applications from documents. I just had no idea the shell was willing to parse XML to find the application.

Of course, the progid can be any progid, for example, here's an example of using XSLT to create Word documents that uses this processing instruction to get the data hooked up with the application.

Comments

[gravatar]
Adam Vandenberg 10:17 PM on 28 Jun 2005

"If XML is going to be a universal data solvent, then we'll need some way other than file extensions to determine how to launch applications from documents."

Why would every program that uses an XML based format use a .xml extension? Why not use a different extension?

(Though this PI is useful anyway.)

[gravatar]
andrew 10:33 PM on 28 Jun 2005

HA! Part of The Plan to "embrace and extend" XML!

[gravatar]
Michael Baltaks 11:51 PM on 28 Jun 2005

So creator codes are making a comeback? What happens when you want to override this to open with some other software? I'm not sure embedding a software choice in the actual file itself is the best idea.

As for the file name extensions business, see here for a much better way to assign data types to files. But being able to tell a file's type is a separate issue to deciding what software should open/edit/print/etc it.

[gravatar]
Fredrik 3:07 AM on 29 Jun 2005

The fact that Ned's example says "InfoPath.Document" and not "Microsoft.InfoPath.Program" should tell you that it's not a creator code, but I suppose it's more important to you to brag about some apple solution than to learn how things work on some other platform...

(since when is adding a PI "extending xml", btw?)

[gravatar]
Ned Batchelder 6:44 AM on 29 Jun 2005

I didn't mean to start (or join in progress) a holy war. Fredrik: the progid doesn't have Microsoft in it, but the fact is that it is a "progid", meaning basically the name of a program, not a data format.

Oh, and anything Andrew says has to be taken with (at least) a grain of salted humor...

[gravatar]
Chris Smith 8:51 AM on 29 Jun 2005

XML has successfully re-invented #!
And there was much rejoicing!
(yay)

[gravatar]
Tim Lesher 10:37 AM on 29 Jun 2005

Actually, Microsoft have made a number of attempts to improve on extension-based file typing. The one that I thought would have taken off is the "FileType" registry key from around 1995, which associates file signatures and masks with applications. I don't know why that never took off, because it seemed almost good enough to work.

[gravatar]
andrew 11:06 AM on 29 Jun 2005

Sometimes geeks are wayyyyy too serious about this shit.

[gravatar]
Sylvain Galineau 2:40 PM on 29 Jun 2005

Cool. Can an HTTP server return the same processing instruction, causing prog-ids to be invoked by IE, Word documents to be created with who-knows-what ? That could be fun.

[gravatar]
polaar 2:55 PM on 29 Jun 2005

While it is certainly useful, it's a pity it doesn't make use of media types (RFC 3023 addresses the need for more than just text/xml or applicaton/xml)
In fact it would be nice if the W3C made a recommendation in the vein of http://www.w3.org/TR/xml-stylesheet/, but for media types:
something like this:
<?xml-type application/ms-infopath.xml?>
(This seems to be the correct media type for InfoPath)

It would be more standardized, and would also make it easier to serve xml files with the right type.

[gravatar]
Michael Chermside 4:59 PM on 15 Sep 2005

I have a sad feeling that I'm too late here and the discussion has moved on, but I'm going to try anyway. I finally got around to playing with this, and I find that I can add processer instructions like < ?mso-application progid="InfoPath.Document"? > to cause the document to open in InfoPath, and < ?mso-application progid="Word.Document"? > to cause the document to open in Word. But how can I cause the document to open in my OWN application?

I'm sure that Windows is looking up the "progid" string someplace in the registry and then from that determining what executable to launch, but I can't piece together how it works to enable me to leverage it myself. Any hints?

[gravatar]
Ned Batchelder 9:12 AM on 16 Sep 2005

The progid is a COM construct. To launch your application, it will need to be a COM server. This is a big complicated topic, but here's a brief page about the registration of progids in the registry: http://izfree.sourceforge.net/tut06.html

[gravatar]
bryan 12:39 PM on 26 Jan 2006

you haven't convinced me that Windows actually uses the shell to do this, it could be that what they've done is just added functionality to msxml to do it, so if you open an xml file by double clicking it defaults to internet explorer which then checking and finding this progid would redirect via code in msxml. Also I suppose then that security areas, local/internet etc. would be used to determine when the xml document was actually opened by IE and when by Word.
There are of course lots of problems with this way of doing it, but damn, it sure would seem wasteful to have the os checking against the .xml extension all the time.

Add a comment:

name
email
Ignore this:
not displayed and no spam.
Leave this empty:
www
not searched.
 
Name and either email or www are required.
Don't put anything here:
Leave this empty:
URLs auto-link and some tags are allowed: <a><b><i><p><br><pre>.