Advertisement
Promo

Application development Toolkit

Unicode report reveals risk of software crashes

Paul Festa CNET News.com CNET News

Published: 17 Jun 2003 10:23 BST

  • Email
  • Trackback
  • Clip Link
  • Print friendly
  • Post Comment

The character set that lets computers write in every language from Czech to Chinese could make Web browsers tongue-tied, two standards groups warned on Friday.

Published by the Unicode Consortium, Unicode is a standard character set for computers that aims to assign a number for every character in every written language. XML (Extensible Markup Language), a World Wide Web Consortium (W3C) recommendation for marking up digital documents and creating new mark-up languages for specific tasks or industries, relies on Unicode and closely tracks its revisions.

But a technical report released by the Unicode Consortium -- and simultaneously published as a note by the W3C's internationalisation activity -- warns document authors that some Unicode features are going to cause XML applications, HTML browsers, and other programs to choke.

Conflict arises between Unicode and Web mark-up languages from the fundamentally different philosophies that underlie the character set and Web standards. While Unicode produces a one-for-one, linear correspondence for every character on the page, XML and its Web-based relatives are more flexible in that they let authors assign different style and functional attributes to a single character, word or page.

For example, Unicode provides what's called "compatibility characters," separate numbers to designate superscript or subscript numerals or letters. With HTML or XML, by contrast, the author would use the basic character and then style it as superscript or subscript.

All things being equal, the W3C advises authors to use the mark-up alternatives.

Compatibility characters are "just not the long-term, sound way to do things," said Martin Duerst, the W3C's internationalisation activity lead and a visiting scientist at the Massachusetts Institute of Technology's Laboratory for Computer Science. "We're urging authors to use Unicode in a responsible and adequate way when it's used with XML."

Many times, authors know that their Unicode is destined to be read by Web browsers and other XML applications. But some of the conflicts crop up as a surprise when XML applications are fed information from older databases and information repositories.

That's when applications that are designed for mark-up languages start stuttering on characters that designate things like vertical tabulators, tab feeds and other controls.

"In the report we go through a lot of different kinds of characters that, in one way or another, may make sense in a legacy system or in plain text, but once you have mark-up at your disposition, you can use structure," Duerst said. "You want to use structure instead of a character, a number. If you're using XML, use what XML makes available. Control character stuff really doesn't work."

The fourth version of Unicode will be out in book form later this year. Prepublication versions of Unicode 4.0 are available online now.


ZDNet UK's Developer News Section delivers the latest headlines together with the best UK jobs, right to your browser.

Let the editors know what you think in the Mailroom.

  • Email
  • Trackback
  • Clip Link
  • Print friendlyPrint with EPSON

Did you find this article useful?
45 out of 203 people found this useful


Full Talkback thread

0 comments

Company/Topic Alerts

Create a new alert from the list below:




Video icon

Video

Discussions

Moley Moley

Re Here we Go Again

Sunday 15 November 2009, 11:55 PM

7 comments
kavurt kavurt

Taking Out the Skype Garbage

Sunday 15 November 2009, 8:45 PM

7 comments
Xwindowsjunkie Xwindowsjunkie

Karmic Koala Krashes

Sunday 15 November 2009, 7:13 PM

3 comments
Tezzer Tezzer

Here we go again :(

Sunday 15 November 2009, 5:32 PM

7 comments

Featured Talkback

In association with Network Liberation Movement
The fact is: Software developers today are really designers and not coders. The reason that business anlaysts exist today to model solutions is because they understand the value of designing software before writing it. All too often developers create code that has little value because they do not understand that business classes interact with other classes within the confines of a working model or pattern.

By: 1000165269

Read full story:
Making sense of agile modelling


Skip Sub Navigation Links to CNET Brand Links

Help

Become part of the ZDNet community.

Newsletters