[LON-CAPA-dev] Problems with images in HTML files created with
MS-Word
Jeremy Bowers
lon-capa-dev@mail.lon-capa.org
Tue, 28 Sep 2004 02:29:20 -0500
Ricardo Luis Kulzer wrote:
> *<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v =
> "urn:schemas-microsoft-com:vml" xmlns:o =
> "urn:schemas-microsoft-com:office:office" xmlns:w =
> "urn:schemas-microsoft-com:office:word"><HEAD><TITLE>1</TITLE>
> <META http-equiv=Content-Type content="text/html; charset=windows-1252">
> <META content=Word.Document name=ProgId>
> <META content="MSHTML 6.00.2800.1458" name=GENERATOR>
> ...*
This is probably not helpful directly to you, Ricardo, but this may
interest other developers on the list.
I would like to observe that this is *flagrently* illegal HTML, stunning
even by Microsoft standards (and I've been dealing with Microsoft "HTML"
since the first tentative steps with Office and Front Page were taken,
so I've about seen it all). "xmlns" tags are only defined for XML (and
by extension XHTML, imported from the XML standard), and then later tags
like META are opened but not closed, and quotes are not used, and it is
in general does not even remotely resemble an XML file, since any
compliant XML parser would gag on the first META tag. (Assuming the
spaces on the attributes are just email bugaboos.) Oh, and the fake
XHTML has the wrong case (all XHTML tag names are lowercase and case
sensitive).
For maximal compatibility with your student's machines, I'd echo Gerd's
recommendation to run generated HTML through a well-known Microsoft
HTML/XHTML fixer. Microsoft products do not really produce HTML, they
just produce something vaguely *like* HTML, and you may find there are
other situations where you will have better luck with cleaned up
Microsoft HTML than the originals.
You can also download the free (open source) standalone Tidy program for
Windows here: http://www.paehl.de/tidy/
In fact the sample screenshots show it taking care of a
Microsoft-generated document.
(Adding something about this to the documentation would probably be a
good idea :-) )