ZDNet UK


Skip to Main Content

ZDNet.co.uk - Winner of Best Business Website 2007
  1. Home
  2. News
  3. Blogs
  4. Reviews
  5. Prices
  6. Resources
  7. Community
  8. My ZDNet

 

ZDNet UK RSS Feeds


IT Jobs

Application development Toolkit

Unicode report reveals risk of software crashes

Paul Festa CNET News.com CNET News.com

Published: 17 Jun 2003 10:23 BST

  • Email
  • Trackback
  • Clip Link
  • Print friendly
  • Post Comment

The character set that lets computers write in every language from Czech to Chinese could make Web browsers tongue-tied, two standards groups warned on Friday.

Published by the Unicode Consortium, Unicode is a standard character set for computers that aims to assign a number for every character in every written language. XML (Extensible Markup Language), a World Wide Web Consortium (W3C) recommendation for marking up digital documents and creating new mark-up languages for specific tasks or industries, relies on Unicode and closely tracks its revisions.

But a technical report released by the Unicode Consortium -- and simultaneously published as a note by the W3C's internationalisation activity -- warns document authors that some Unicode features are going to cause XML applications, HTML browsers, and other programs to choke.

Conflict arises between Unicode and Web mark-up languages from the fundamentally different philosophies that underlie the character set and Web standards. While Unicode produces a one-for-one, linear correspondence for every character on the page, XML and its Web-based relatives are more flexible in that they let authors assign different style and functional attributes to a single character, word or page.

For example, Unicode provides what's called "compatibility characters," separate numbers to designate superscript or subscript numerals or letters. With HTML or XML, by contrast, the author would use the basic character and then style it as superscript or subscript.

All things being equal, the W3C advises authors to use the mark-up alternatives.

Compatibility characters are "just not the long-term, sound way to do things," said Martin Duerst, the W3C's internationalisation activity lead and a visiting scientist at the Massachusetts Institute of Technology's Laboratory for Computer Science. "We're urging authors to use Unicode in a responsible and adequate way when it's used with XML."

Many times, authors know that their Unicode is destined to be read by Web browsers and other XML applications. But some of the conflicts crop up as a surprise when XML applications are fed information from older databases and information repositories.

That's when applications that are designed for mark-up languages start stuttering on characters that designate things like vertical tabulators, tab feeds and other controls.

"In the report we go through a lot of different kinds of characters that, in one way or another, may make sense in a legacy system or in plain text, but once you have mark-up at your disposition, you can use structure," Duerst said. "You want to use structure instead of a character, a number. If you're using XML, use what XML makes available. Control character stuff really doesn't work."

The fourth version of Unicode will be out in book form later this year. Prepublication versions of Unicode 4.0 are available online now.


ZDNet UK's Developer News Section delivers the latest headlines together with the best UK jobs, right to your browser.

Let the editors know what you think in the Mailroom.

  • Email
  • Trackback
  • Clip Link
  • Print friendly Print with Dell

Did you find this article useful?
45 out of 203 people found this useful


Full Talkback thread

0 comments


Company/Topic Alerts

Create a new alert from the list below:




Related Jobs

Senior Java Developer - Devon 45-50k URGENT REQUIREMENT

Essential Skills: JAVA AJAX SPRING XML SQL2005 HIBERNATE You should have superb communication skills as you will have one direct reportee and perfect ...

Senior Developer - (.Net / C#) - Sheffield - 38000

A knowledge of Microsoft Content Management System Experience with Microsoft Office SharePoint Server Experience with SiteFinity or SiteCore CMS ...

Contract Java Developer - West Midlands

You must be comfortable working with common Web markup languages: HTML and XML. Java / J2EE / XML / ECommerce. Essential Skills Include: - web ...

Discussions

dogStar dogStar

Shake those Monkeys!

Friday 25 July 2008, 9:51 AM

1 comment
Freddyoky Freddyoky

Police And The Internet

Friday 25 July 2008, 8:32 AM

4 comments

Featured Talkback

The fact is: Software developers today are really designers and not coders. The reason that business anlaysts exist today to model solutions is because they understand the value of designing software before writing it. All too often developers create code that has little value because they do not understand that business classes interact with other classes within the confines of a working model or pattern.

By: 1000165269

Read full story:
Making sense of agile modelling