Creation Zone

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Sunday, 4 December 2005

Sun Studio C/C++: Improve performance with -xtarget, -xarch

Posted on 02:28 by Unknown
Even though many software vendors don't support SPARC v8 architecture (ie., pre-UltraSPARC era), for some reason they hesitate to use -xtarget option with any value other than generic (default), in building their softwares. Perhaps they are not aware of the benefits of specifying target platform and/or not spending enough time experimenting with different values to compare the performance.

In general, it is always recommended to specify the target platform with -xtarget option, and the target instruction set architecture with -xchip option, for better performance. I believe one of the major concern {for software vendors} in specifying the target platform is the suspicion that the application may not run on a wide range of platforms. While it is true upto some extent, still there is a chance to specify some value for the target platform, if we knew that all the supported architecture is compatible with the one we specify with -xchip option.

32-bit SPARC applications, and -xtarget=ultra3 -xarch=v8plusa

For example, for a 32-bit application, if we know for sure that the supported architecture will only be UltraSPARC chip architecture, it is strongly recommended to use -xtarget=ultra3, -xarch=v8plusa options in building the application. -xarch=v8plusa selects an instruction set that is Okay for all the members of UltraSPARC family (US-I, II, III, III+, IV, IV+, T1 (code named Niagara)). -xchip=ultra3 tells the optimizer to optimize for best execution on US-III, and later systems. The code will run well on the US-I & II boxes, but possibly a little slower than if optimized for them.

Performance improvement from a real world application

One of our partners (an ISV, in short) is shipping their product with -xtarget=generic -xarch=v8plusa for the past few years. Their application supports only UltraSPARC platform. So, recently I have experimented with their application by building it with -xtarget=ultra3 -xarch=v8plusa on a US-IV machine. When the application was run on a US-III box with moderate workload, (not so surprisingly) the run-time performance of the application was improved by ~2.5% (compared to the numbers from -xtarget=generic -xarch=v8plusa build). Of course, there is no performance regression on a US-II box, and the performance is comparable to the vanilla build ie., built with -xtarget=generic -xarch=v8plusa option; also the performance gains on a US-IV box is relatively comparable to the gains on a US-III box.

These experiments gave enough confidence to the ISV to go with -xtarget=ultra3 -xarch=v8plusa combination; and the next version of their application is being built with those options.

Note:
Do not use -xtarget=ultra3, if there is a heavy use of the Sun performance library. In that case you really need to have specific separate builds for all the target platforms, because there is no single optimized perflib is available, that is suitable for all architectures.


Excerpts from Darryl Gove's Selecting the Best Compiler Options article

Darryl Gove, a senior performance engineer at Sun Microsystems, recently posted an article about selecting the best compiler options to improve the run-time performance of the application(s). Since it has a ton of information about 32/64-bit applications on UltraSPARC, x64/x86 platforms, I thought of copy, pasting the relevant information here {for completeness}, instead of just pointing to the article.

Specify the Target Platform and Architecture as Explicitly as Possible

The target platform specifies the processor that the application is expected to run on, the minimum processor that is required, and whether the application is 32-bit or 64-bit. For compiler versions prior to the SunStudio 9 release, the compiler specified a generic processor; SunStudio 9 compilers target an UltraSPARC processor for the SPARC architecture, and a generic x86 based processor for the x86 architecture. In all cases it is best to explicitly specify the target processor, since it is possible in some cases for the target processor to depend on the hardware upon which the application is built.

There are a number of compiler flags that specify the target. The flag -xtarget sets all the other flags to appropriate default values for the given target processor: -xarch, -xchip, and -xcache. The flag -xarch sets the instruction set that the processor supports, the flag -xchip specifies how the compiler should use these instructions. Finally the flag -xcache specifies the
structure of the caches for this target (however this flag may not have any impact for many codes). As with all compiler flags, the order is important; flags accumulate from left to right, in the event that there are conflicting settings the flag on the right will override the values of flags which were specified earlier on the
command line.

A point to be cautious of is that specifying a more recent hardware target may mean that older hardware is no longer able to run the application. In particular specifying the target as being an UltraSPARC platform means that the application will no longer run on pre-UltraSPARC processors (however UltraSPARC processors have been shipping for over 10 years). Similarly specifying an Opteron processor will mean that the code no longer runs x86-compatible processors that do not have the SSE2 instruction set extensions.

Specifying the target platform for the UltraSPARC processor family

For UltraSPARC processors, a generally good option pair to use is -xtarget=ultra3 with -xarch=v8plusa. These options allow the compiler to generate 32-bit code that can run on all the members of the UltraSPARC family and their follow-ons (UltraSPARC I, UltraSPARC II, UltraSPARC III, UltraSPARC IV). The compiler will also schedule the code especially for the UltraSPARC III. These options represents a good compromise, since code scheduled for the UltraSPARC III is better at taking advantage of the new features of the UltraSPARC III architecture, while still providing good performance on previous generations of processors.

If the application requires the capability to address 64-bit memory addresses, then the appropriate flags to use are -xtarget=ultra3 -xarch=v9a which adds 64-bit addressing whilst still targeting all the members of the UltraSPARC family of processors.


Recommended compiler flags for the UltraSPARC platform
32-bit code-xtarget=ultra3 -xarch=v8plusa
64-bit code-xtarget=ultra3 -xarch=v9a

Specifying the target processor for the x64 processor family

By default the compiler targets a 32-bit generic x86 based processor, so the code will run on any x86 processor from a Pentium Pro up to an AMD Opteron architecture. Whilst this produces code that can run over the widest range of processors, this does not take advantage of the extensions offered by the Opteron family of processors. Consequently it is recommended that for 32-bit code the Opteron processor is targeted, this will generate code that will run on processors (such as the Pentium 4 and Opteron) which support the SSE2 instruction set extensions.

To take advantage of the x64 processor family and the advantages of 64-bit code, the appropriate compiler flags are -xtarget=opteron -xarch=amd64.

Recommended compiler flags for the x64 platform
32-bit code-xtarget=opteron
64-bit code-xtarget=opteron -xarch=amd64

Using -xtarget=generic

The compiler also supports the options -xtarget=generic and -xtarget=generic64. These options tell the compiler to produce code which runs well on as wide a range of machines as possible. One feature of these flags is that they will be interpreted appropriately on both the SPARC and x64 platforms -- so using them may mean fewer changes to makefile flags. The following table shows how the compiler will interpret the -xtarget=generic flag on both the SPARC and x64 platforms.

FlagSPARCx64
-xtarget=genericV8plus architecture386 architecture
-xtarget=generic64V9 architectureAMD64 architecture


Credit:
Darryl Gove, Sun Product Technical Support JSE EMEA
___________________
Technorati tags: Sun Studio
Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • *nix: Workaround to cannot find zipfile directory in one of file.zip or file.zip.zip ..
    Symptom: You are trying to extract the archived files off of a huge (any file with size > 2 GB or 4GB, depending on the OS) ZIP file with...
  • JDS: Installing Sun Java Desktop System 2.0
    This document will guide you through the process of installing JDS 2.0 on a PC from integrated CDROM images Requirements I...
  • Linux: Installing Source RPM (SRPM) package
    RPM stands for RedHat Package Manager. RPM is a system for installing and managing software & most common software package manager used ...
  • Solaris: malloc Vs mtmalloc
    Performance of Single Vs Multi-threaded application Memory allocation performance in single and multithreaded environments is an important a...
  • C/C++: Printing Stack Trace with printstack() on Solaris
    libc on Solaris 9 and later, provides a useful function called printstack , to print a symbolic stack trace to the specified file descripto...
  • Installing MySQL 5.0.51b from the Source Code on Sun Solaris
    Building and installing the MySQL server from the source code is relatively very easy when compared to many other OSS applications. At least...
  • Oracle Apps on T2000: ORA-04020 during Autoinvoice
    The goal of this brief blog post is to provide a quick solution to all Sun-Oracle customers who may run into a deadlock when a handful of th...
  • Siebel Connection Broker Load Balancing Algorithm
    Siebel server architecture supports spawning multiple application object manager processes. The Siebel Connection Broker, SCBroker, tries to...
  • 64-bit dbx: internal error: signal SIGBUS (invalid address alignment)
    The other day I was chasing some lock contention issue with a 64-bit application running on Solaris 10 Update 1; and stumbled with an unexpe...
  • Oracle 10gR2/Solaris x64: Fixing ORA-20000: Oracle Text errors
    First, some facts: * Oracle Applications 11.5.10 (aka E-Business Suite 11 i ) database is now supported on Solaris 10 for x86-64 architectur...

Categories

  • 80s music playlist
  • bandwidth iperf network solaris
  • best
  • black friday
  • breakdown database groups locality oracle pmap sga solaris
  • buy
  • deal
  • ebiz ebs hrms oracle payroll
  • emca oracle rdbms database ORA-01034
  • friday
  • Garmin
  • generic+discussion software installer
  • GPS
  • how-to solaris mmap
  • impdp ora-01089 oracle rdbms solaris tips upgrade workarounds zombie
  • Magellan
  • music
  • Navigation
  • OATS Oracle
  • Oracle Business+Intelligence Analytics Solaris SPARC T4
  • oracle database flashback FDA
  • Oracle Database RDBMS Redo Flash+Storage
  • oracle database solaris
  • oracle database solaris resource manager virtualization consolidation
  • Oracle EBS E-Business+Suite SPARC SuperCluster Optimized+Solution
  • Oracle EBS E-Business+Suite Workaround Tip
  • oracle lob bfile blob securefile rdbms database tips performance clob
  • oracle obiee analytics presentation+services
  • Oracle OID LDAP ADS
  • Oracle OID LDAP SPARC T5 T5-2 Benchmark
  • oracle pls-00201 dbms_system
  • oracle siebel CRM SCBroker load+balancing
  • Oracle Siebel Sun SPARC T4 Benchmark
  • Oracle Siebel Sun SPARC T5 Benchmark T5-2
  • Oracle Solaris
  • Oracle Solaris Database RDBMS Redo Flash F40 AWR
  • oracle solaris rpc statd RPC troubleshooting
  • oracle solaris svm solaris+volume+manager
  • Oracle Solaris Tips
  • oracle+solaris
  • RDC
  • sale
  • Smartphone Samsung Galaxy S2 Phone+Shutter Tip Android ICS
  • solaris oracle database fmw weblogic java dfw
  • SuperCluster Oracle Database RDBMS RAC Solaris Zones
  • tee
  • thanksgiving sale
  • tips
  • TomTom
  • windows

Blog Archive

  • ►  2013 (16)
    • ►  December (3)
    • ►  November (2)
    • ►  October (1)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (1)
    • ►  May (1)
    • ►  April (1)
    • ►  March (1)
    • ►  February (2)
    • ►  January (1)
  • ►  2012 (14)
    • ►  December (1)
    • ►  November (1)
    • ►  October (1)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  May (1)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (2)
  • ►  2011 (15)
    • ►  December (2)
    • ►  November (1)
    • ►  October (2)
    • ►  September (1)
    • ►  August (2)
    • ►  July (1)
    • ►  May (2)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (1)
  • ►  2010 (19)
    • ►  December (3)
    • ►  November (1)
    • ►  October (2)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (1)
    • ►  May (5)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (1)
  • ►  2009 (25)
    • ►  December (1)
    • ►  November (2)
    • ►  October (1)
    • ►  September (1)
    • ►  August (2)
    • ►  July (2)
    • ►  June (1)
    • ►  May (2)
    • ►  April (3)
    • ►  March (1)
    • ►  February (5)
    • ►  January (4)
  • ►  2008 (34)
    • ►  December (2)
    • ►  November (2)
    • ►  October (2)
    • ►  September (1)
    • ►  August (4)
    • ►  July (2)
    • ►  June (3)
    • ►  May (3)
    • ►  April (2)
    • ►  March (5)
    • ►  February (4)
    • ►  January (4)
  • ►  2007 (33)
    • ►  December (2)
    • ►  November (4)
    • ►  October (2)
    • ►  September (5)
    • ►  August (3)
    • ►  June (2)
    • ►  May (3)
    • ►  April (5)
    • ►  March (3)
    • ►  February (1)
    • ►  January (3)
  • ►  2006 (40)
    • ►  December (2)
    • ►  November (6)
    • ►  October (2)
    • ►  September (2)
    • ►  August (1)
    • ►  July (2)
    • ►  June (2)
    • ►  May (4)
    • ►  April (5)
    • ►  March (5)
    • ►  February (3)
    • ►  January (6)
  • ▼  2005 (72)
    • ▼  December (5)
      • Solaris: Improve 64-bit link time w/ LD_NOEXEC_64
      • Solaris: Estimating process memory footprint
      • Sun Studio: debugging a multi-threaded application...
      • Sun Studio C/C++: Improve performance with -xtarge...
      • Sun Studio 11: Asynchronous Profile Feedback Data ...
    • ►  November (2)
    • ►  October (6)
    • ►  September (5)
    • ►  August (5)
    • ►  July (10)
    • ►  June (8)
    • ►  May (9)
    • ►  April (6)
    • ►  March (6)
    • ►  February (5)
    • ►  January (5)
  • ►  2004 (36)
    • ►  December (1)
    • ►  November (5)
    • ►  October (12)
    • ►  September (18)
Powered by Blogger.

About Me

Unknown
View my complete profile