Creation Zone

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Friday, 29 July 2005

Sun Studio C/C++: Support for UTF-16 String Literals

Posted on 17:33 by Unknown
Characters in general are stored as ASCII codes. ASCII uses one byte (8 bits) to store each character, and hence can store a maximum of 256 (28) characters. Due to this limitation, ASCII is not capable of accomodating new characters to support other languages like Chinese, Japanese, Indic languages etc. The existing character set (ASCII) is adequate, as long as we restrict ourselves to english alphabet and most commonly used punctuation and technical symbols. But this is not the case, if we want to extend our applications to support quite a number of international languages.

Unicode

This problem can be alleviated by increasing the number of bits used to store each character. Unicode standard was evolved to specify the representation of text in modern software products and standards; and as a result data will be transported through many different systems without corruption.

Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language is. The Unicode standard defines three encoding forms that allow the same data to be transmitted in a byte (8 bits), word (16 bits) or double word (32 bits) oriented format. All these encoding forms can be efficiently transformed into one another without any loss of data. Of these three encoding forms, UTF-16 is extremely popular because most of the heavily used characters fit into one 16-bit code unit ie., 65,536 (216) characters; and it occupies only 2 bytes in memory.

To represent Unicode characters, the char data type is not suitable; and hence we cannot use the string routines supplied with standard C library, on Unicode data. To represent UTF-16 characters, unsigned short data type can be used, since it occupies 2 byte storage. Also it avoids the need for introducing a new data type into compilers.

Sun's support for UTF-16 string literals

16-bit character string literals are not part of C/C++ standard yet. So, to address the needs of customers who develop/support internationalized applications, Sun introduced limited support for string literals of 16-bit (UTF-16 and UCS2) characters, as a language extension in Sun Studio Compiler Collection 8.

By default, the C/C++ compiler doesn't recognize the 16-bit character string literal. To make it recognize the 16-bit character string literals and to convert 'em to UTF-16 strings in the object file, we need to use the compiler switch -xustr=ascii_utf16_ushort. Since U"ASCII_String" syntax may become standard (in the future), Sun adopted this syntax to form 16-bit character string literals. Note that a non ASCII character in a character or string literal is an error.

eg., U"SomeString";

A literal character (U'c') has type const unsigned short, and a literal string (U"string") has type array of const unsigned short

Be aware that there is no library of supporting routines for such strings or characters. The users have to write their own string handling and I/O routines. One obvious reason for the lack of supporting library is being non-standard; and it is not easy to predict what will eventually be adopted by the standards committee(s). In the worst case, Sun may end up supporting a library that conflicts with a standard library.

Here's an example:
% cat unicode.c
const unsigned short *dummy = U"dummy";
const unsigned short unicodestr[] = U"UnicodeString";

const unsigned short *greet() {
return U"Hello!";
};
This code has to be compiled with -xustr=ascii_utf16_ushort option, for the compiler to recognize and convert these string literals to UTF-16 strings in the object file. Note that the compiler option -xustr=ascii_utf16_ushort is the same for both C and C++ compilers

C
% cc -w -c unicode.c
"unicode.c", line 1: undefined symbol: U
"unicode.c", line 1: non-constant initializer: op "NAME"
"unicode.c", line 1: syntax error before or at: "dummy"
"unicode.c", line 2: non-constant initializer: op "NAME"
"unicode.c", line 2: syntax error before or at: "UnicodeString"
"unicode.c", line 5: syntax error before or at: "Hello!"
cc: acomp failed for unicode.c

% cc -w -c -xustr=ascii_utf16_ushort unicode.c

% ls -l unicode.o
-rw-rw-r-- 1 gmandali ccuser 1376 Jul 29 18:19 unicode.o
C++
% CC -c unicode.c
"unicode.c", line 1: Error: U is not defined.
"unicode.c", line 1: Error: Badly formed expression.
"unicode.c", line 2: Error: U is not defined.
"unicode.c", line 2: Error: Badly formed expression.
"unicode.c", line 5: Error: U is not defined.
"unicode.c", line 5: Error: Badly formed expression.
6 Error(s) detected.

% CC -c -xustr=ascii_utf16_ushort unicode.c

% ls -l unicode.o
-rw-rw-r-- 1 gmandali ccuser 1256 Jul 29 18:20 unicode.o

__________________
Technorati tags: Sun Studio | C | C++
Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • *nix: Workaround to cannot find zipfile directory in one of file.zip or file.zip.zip ..
    Symptom: You are trying to extract the archived files off of a huge (any file with size > 2 GB or 4GB, depending on the OS) ZIP file with...
  • JDS: Installing Sun Java Desktop System 2.0
    This document will guide you through the process of installing JDS 2.0 on a PC from integrated CDROM images Requirements I...
  • Linux: Installing Source RPM (SRPM) package
    RPM stands for RedHat Package Manager. RPM is a system for installing and managing software & most common software package manager used ...
  • Solaris: malloc Vs mtmalloc
    Performance of Single Vs Multi-threaded application Memory allocation performance in single and multithreaded environments is an important a...
  • C/C++: Printing Stack Trace with printstack() on Solaris
    libc on Solaris 9 and later, provides a useful function called printstack , to print a symbolic stack trace to the specified file descripto...
  • Installing MySQL 5.0.51b from the Source Code on Sun Solaris
    Building and installing the MySQL server from the source code is relatively very easy when compared to many other OSS applications. At least...
  • Oracle Apps on T2000: ORA-04020 during Autoinvoice
    The goal of this brief blog post is to provide a quick solution to all Sun-Oracle customers who may run into a deadlock when a handful of th...
  • Siebel Connection Broker Load Balancing Algorithm
    Siebel server architecture supports spawning multiple application object manager processes. The Siebel Connection Broker, SCBroker, tries to...
  • 64-bit dbx: internal error: signal SIGBUS (invalid address alignment)
    The other day I was chasing some lock contention issue with a 64-bit application running on Solaris 10 Update 1; and stumbled with an unexpe...
  • Oracle 10gR2/Solaris x64: Fixing ORA-20000: Oracle Text errors
    First, some facts: * Oracle Applications 11.5.10 (aka E-Business Suite 11 i ) database is now supported on Solaris 10 for x86-64 architectur...

Categories

  • 80s music playlist
  • bandwidth iperf network solaris
  • best
  • black friday
  • breakdown database groups locality oracle pmap sga solaris
  • buy
  • deal
  • ebiz ebs hrms oracle payroll
  • emca oracle rdbms database ORA-01034
  • friday
  • Garmin
  • generic+discussion software installer
  • GPS
  • how-to solaris mmap
  • impdp ora-01089 oracle rdbms solaris tips upgrade workarounds zombie
  • Magellan
  • music
  • Navigation
  • OATS Oracle
  • Oracle Business+Intelligence Analytics Solaris SPARC T4
  • oracle database flashback FDA
  • Oracle Database RDBMS Redo Flash+Storage
  • oracle database solaris
  • oracle database solaris resource manager virtualization consolidation
  • Oracle EBS E-Business+Suite SPARC SuperCluster Optimized+Solution
  • Oracle EBS E-Business+Suite Workaround Tip
  • oracle lob bfile blob securefile rdbms database tips performance clob
  • oracle obiee analytics presentation+services
  • Oracle OID LDAP ADS
  • Oracle OID LDAP SPARC T5 T5-2 Benchmark
  • oracle pls-00201 dbms_system
  • oracle siebel CRM SCBroker load+balancing
  • Oracle Siebel Sun SPARC T4 Benchmark
  • Oracle Siebel Sun SPARC T5 Benchmark T5-2
  • Oracle Solaris
  • Oracle Solaris Database RDBMS Redo Flash F40 AWR
  • oracle solaris rpc statd RPC troubleshooting
  • oracle solaris svm solaris+volume+manager
  • Oracle Solaris Tips
  • oracle+solaris
  • RDC
  • sale
  • Smartphone Samsung Galaxy S2 Phone+Shutter Tip Android ICS
  • solaris oracle database fmw weblogic java dfw
  • SuperCluster Oracle Database RDBMS RAC Solaris Zones
  • tee
  • thanksgiving sale
  • tips
  • TomTom
  • windows

Blog Archive

  • ►  2013 (16)
    • ►  December (3)
    • ►  November (2)
    • ►  October (1)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (1)
    • ►  May (1)
    • ►  April (1)
    • ►  March (1)
    • ►  February (2)
    • ►  January (1)
  • ►  2012 (14)
    • ►  December (1)
    • ►  November (1)
    • ►  October (1)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  May (1)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (2)
  • ►  2011 (15)
    • ►  December (2)
    • ►  November (1)
    • ►  October (2)
    • ►  September (1)
    • ►  August (2)
    • ►  July (1)
    • ►  May (2)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (1)
  • ►  2010 (19)
    • ►  December (3)
    • ►  November (1)
    • ►  October (2)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (1)
    • ►  May (5)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (1)
  • ►  2009 (25)
    • ►  December (1)
    • ►  November (2)
    • ►  October (1)
    • ►  September (1)
    • ►  August (2)
    • ►  July (2)
    • ►  June (1)
    • ►  May (2)
    • ►  April (3)
    • ►  March (1)
    • ►  February (5)
    • ►  January (4)
  • ►  2008 (34)
    • ►  December (2)
    • ►  November (2)
    • ►  October (2)
    • ►  September (1)
    • ►  August (4)
    • ►  July (2)
    • ►  June (3)
    • ►  May (3)
    • ►  April (2)
    • ►  March (5)
    • ►  February (4)
    • ►  January (4)
  • ►  2007 (33)
    • ►  December (2)
    • ►  November (4)
    • ►  October (2)
    • ►  September (5)
    • ►  August (3)
    • ►  June (2)
    • ►  May (3)
    • ►  April (5)
    • ►  March (3)
    • ►  February (1)
    • ►  January (3)
  • ►  2006 (40)
    • ►  December (2)
    • ►  November (6)
    • ►  October (2)
    • ►  September (2)
    • ►  August (1)
    • ►  July (2)
    • ►  June (2)
    • ►  May (4)
    • ►  April (5)
    • ►  March (5)
    • ►  February (3)
    • ►  January (6)
  • ▼  2005 (72)
    • ►  December (5)
    • ►  November (2)
    • ►  October (6)
    • ►  September (5)
    • ►  August (5)
    • ▼  July (10)
      • Sun Studio C/C++: Support for UTF-16 String Literals
      • Running Enterprise applications on Solaris 10
      • My Favorite Music - A Retro V
      • Solaris: Initialization & Termination routines in ...
      • Sun Studio C/C++: Profile Feedback Optimization (PFO)
      • My Favorite Music - A Retro IV
      • My Favorite Music - A Retro III
      • My Favorite Music - A Retro II
      • My Favorite Music - A Retro I
      • MPlayer: Extracting Audio from a DVD
    • ►  June (8)
    • ►  May (9)
    • ►  April (6)
    • ►  March (6)
    • ►  February (5)
    • ►  January (5)
  • ►  2004 (36)
    • ►  December (1)
    • ►  November (5)
    • ►  October (12)
    • ►  September (18)
Powered by Blogger.

About Me

Unknown
View my complete profile