Creation Zone

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Thursday, 4 August 2005

Sun Studio C/C++: Annotated listing (compiler commentary) with er_src

Posted on 18:10 by Unknown
Wouldn't it be nice if we know what exactly the compiler did when we specify a set of optimization flags on compile line? We specify a wide variety of compiler options with the hope that the resulting binary performs better. But unless we know for sure that using a certain compilation flag helps, most of the times it appears that the compiler is doing nothing, and even we may think that certain options were there just to give placebo effect to the user.

Sun ships a tool called er_src with Sun Studio compilers; so the users can examine or have a look at the optimizations done by the compiler(s). With the availability of er_src, compiler optimizations are no longer treated as a "black box" optimization.

Compiler logs most of its actions in stabs/dwarf of ELF object file, when the source was compiled with -g (debug) flag. Compiler generated messages are called "compiler commentary"; and the commentary will be interspersed in the source code, where the compiler did some optimization or transformation. When compiled with debug flag (-g), compiler commentary and the location of the source code will be stored in the object file (.o). er_src tool reads the source file and interleaves the compiler commentary in the output. Obviously the original source file has to be there in the path that was stored in the object file, during compilation. [Thanks to Chris Quenelle for the correction]

With er_src <object-file>, er_src dumps all the source along with compiler commentary. It is also possible to get the commentary and the disassembly for all or selected functions. Note that er_src even accepts Java class (.class) files.

Read the man page of er_src, for an explanation of how to read compiler commentary in object files to determine for which functions the compiler actually makes a substitution.

eg.,
 % cat string.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int __strcmp(const char *str1, const char *str2 ) {
int rc = 0;

for(;;) {
rc = *str1 - *str2;
if(rc != 0 || *str1 == 0) {
return (rc);
}
++str1;
++str2;
}
}

int __strlen(const char *str) {
int length = 0;

for(;;) {
if (*str == 0) {
return (length);
} else {
++length;
++str;
}
}
}

char *__strreverse(const char *str) {
int i, length = 0;
char *revstr = NULL;

length = __strlen(str);
revstr = (char *) malloc (sizeof (char) * length);

for (i = length; i > 0; --i) {
*(revstr + i - 1) = *(str + length - i);
}

return (revstr);
}

int main() {
printf("\nstrcmp(pod, podcast) = %d", __strcmp("pod", "podcast"));
printf("\nstrlen(Solaris10) = %d", __strlen("Solaris10"));
printf("\nreverse(Solaris10) = %s", __strreverse("Solaris10"));

return (0);
}

To see the impact of compiling the same code with -O4 optimization, compile this source with -g -xO4 options
% cc -c -g -xO4 string.c

% er_src string.o
Source file: ./string.c
Object file: ./string.o
Load Object: ./string.o

1. #include <stdio.h>
2. #include <string.h>
3. #include <stdlib.h>
4.
5. int __strcmp(const char *str1, const char *str2 ) {


Bounds test for loop below moved to top of loop
6. int rc = 0;
7.
8. for(;;) {
9. rc = *str1 - *str2;
10. if(rc != 0 || *str1 == 0) {
11. return (rc);
12. }
13. ++str1;
14. ++str2;
15. }
16. }
17.
18. int __strlen(const char *str) {


Bounds test for loop below moved to top of loop
19. int length = 0;
20.
21. for(;;) {
22. if (*str == 0) {
23. return (length);
24. } else {
25. ++length;
26. ++str;
27. }
28. }
29. }
30.
31. char *__strreverse(const char *str) {
32. int i, length = 0;
33. char *revstr = NULL;
34.

Function __strlen inlined from source file string.c into the code for the following line
Bounds test for loop below moved to top of loop

35. length = __strlen(str);
36. revstr = (char *) malloc (sizeof (char) * length);
37.

Loop below scheduled with steady-state cycle count = 3 <= indicates that
software pipelining (modulo scheduling) has been applied

Loop below unrolled 1 times
Loop below has 1 loads, 1 stores, 0 prefetches, 0 FPadds, 0 FPmuls, and 0 FPdivs per iteration

38. for (i = length; i > 0; --i) {
39. *(revstr + i - 1) = *(str + length - i);
40. }
41.
42. return (revstr);
43. }
44.
45.
46. int main() {

Function __strcmp inlined from source file string.c into the code for the following line
Bounds test for loop below moved to top of loop

47. printf("\nstrcmp(pod, podcast) = %d", __strcmp("pod", "podcast"));

Function __strlen inlined from source file string.c into the code for the following line
Bounds test for loop below moved to top of loop

48. printf("\nstrlen(Solaris10) = %d", __strlen("Solaris10"));

Function __strreverse inlined from source file string.c into the code for the following line
Function __strlen inlined from source file string.c into inline copy of function __strreverse
Bounds test for loop below moved to top of loop
Loop below scheduled with steady-state cycle count = 3
Loop below unrolled 1 times
Loop below has 1 loads, 1 stores, 0 prefetches, 0 FPadds, 0 FPmuls, and 0 FPdivs per iteration

49. printf("\nreverse(Solaris10) = %s", __strreverse("Solaris10"));
50.
51. return (0);
52. }
From this listing, it is clear that the compiler tried its best to optimize the code by inlining the routines, and by doing loop unrolling and transformations. Of course, these are the things it is supposed to do with the documented -xO4 option. But since the compiler predictions may not be correct all the time, it is the responsibility of the user (say developer) to find out how the code being laid out; and if not satisfied with the outcome, to give more hints to the compiler with the compiler supported pragmas, profile feedback, rearranging the code etc.,

Now let's see what the compiler thinks about the same code, if we provide some feedback about run-time behavior of the program.
 % cc -g -xO4 -xprofile=collect -o string string.c

% ./string

strcmp(pod, podcast) = -99 <- returns 0 if matches
strlen(Solaris10) = 9
reverse(Solaris10) = 01siraloS

% ls -ld string.profile
drwxrwxrwx 2 build engr 512 Aug 4 17:23 string.profile/

% cc -g -xO4 -xprofile=use:string -c string.c

% er_src string.o
Source file: ./string.c
Object file: ./string.o
Load Object: ./string.o

1. #include <stdio.h>
2. #include <string.h>
3. #include <stdlib.h>
4.
5. int __strcmp(const char *str1, const char *str2 ) {

6. int rc = 0;
7.
8. for(;;) {
9. rc = *str1 - *str2;
10. if(rc != 0 || *str1 == 0) {
11. return (rc);
12. }
13. ++str1;
14. ++str2;
15. }
16. }
17.
18. int __strlen(const char *str) {

19. int length = 0;
20.
21. for(;;) {
22. if (*str == 0) {
23. return (length);
24. } else {
25. ++length;
26. ++str;
27. }
28. }
29. }
30.
31. char *__strreverse(const char *str) {

32. int i, length = 0;
33. char *revstr = NULL;
34.

Function __strlen not inlined because the profile-feedback execution count is too low
35. length = __strlen(str);
36. revstr = (char *) malloc (sizeof (char) * length);
37.

Loop below scheduled with steady-state cycle count = 3
Loop below unrolled 1 times
Loop below has 1 loads, 1 stores, 0 prefetches, 0 FPadds, 0 FPmuls, and 0 FPdivs per iteration

38. for (i = length; i > 0; --i) {
39. *(revstr + i - 1) = *(str + length - i);
40. }
41.
42. return (revstr);
43. }
44.
45.
46. int main() {


Function __strcmp not inlined because the profile-feedback execution count is too low
47. printf("\nstrcmp(pod, podcast) = %d", __strcmp("pod", "podcast"));

Function __strlen not inlined because the profile-feedback execution count is too low
48. printf("\nstrlen(Solaris10) = %d", __strlen("Solaris10"));

Function __strreverse not inlined because the profile-feedback execution count is too low
49. printf("\nreverse(Solaris10) = %s", __strreverse("Solaris10"));
50.
51. return (0);
52. }
This time, the compiler thought it is not very beneficial to inline the routines because the execution frequency of those routines is too low (1 in this case); and of course that's what profile feedback optimization is supposed to do ie., optimizing the code, based on the run-time feedback. In this example, both -xO4 and -xprofile (Profile Feedback Optimization) are working together to make the best decision.

Few more examples:
To list all functions from the given object:
% er_src -func string.o

Functions sorted in lexicographic order

Load Object:

Address Size Name

0x00000000 64 __strcmp
0x00000040 72 __strlen
0x00000088 184 __strreverse
0x00000140 372 main
To print the compiler commentary only for changes involved inlining:
% er_src -cc inline string.o
...
...

29. }
30.
31. char *__strreverse(const char *str) {
32. int i, length = 0;
33. char *revstr = NULL;
34.

Function __strlen inlined from source file string.c into the code for the following line
35. length = __strlen(str);
36. revstr = (char *) malloc (sizeof (char) * length);
37.
...
...

43. }
44.
45.
46. int main() {

Function __strcmp inlined from source file string.c into the code for the following line
47. printf("\nstrcmp(pod, podcast) = %d", __strcmp("pod", "podcast"));

Function __strlen inlined from source file string.c into the code for the following line
48. printf("\nstrlen(Solaris10) = %d", __strlen("Solaris10"));

Function __strreverse inlined from source file string.c into the code for the following line
Function __strlen inlined from source file string.c into inline copy of function __strreverse
49. printf("\nreverse(Solaris10) = %s", __strreverse("Solaris10"));
50.
51. return (0);
52. }
To print disassembly:
% er_src -disasm all -1 string.o
---------------------------------------
Annotated disassembly
---------------------------------------
Source file: ./string.c
Object file: ./string.o
Load Object: ./string.o

1. #include <stdio.h>
2. #include <string.h>
3. #include <stdlib.h>
4.
5. int __strcmp(const char *str1, const char *str2 ) {

[ 5] 0: ldsb [%o0], %o4
[ 5] 4: mov %o0, %o3

Bounds test for loop below moved to top of loop
6. int rc = 0;
7.
8. for(;;) {
9. rc = *str1 - *str2;
[ 9] 8: ldsb [%o1], %o5
[ 9] c: subcc %o4, %o5, %o0
10. if(rc != 0 || *str1 == 0) {
[10] 10: bne,pn %icc,0x38
[10] 14: cmp %o4, 0
[10] 18: be,pn %icc,0x38
11. return (rc);
12. }
13. ++str1;
[13] 1c: inc %o3
[ 9] 20: ldsb [%o1 + 1], %o5
[ 9] 24: ldsb [%o3], %o4
14. ++str2;
[14] 28: inc %o1
[ 9] 2c: subcc %o4, %o5, %o0
[10] 30: be,pt %icc,0x18
[10] 34: cmp %o4, 0
[11] 38: retl
[11] 3c: nop
15. }
16. }
17.
...
...
__________________
Technorati tags: Sun Studio | C | C++
Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • *nix: Workaround to cannot find zipfile directory in one of file.zip or file.zip.zip ..
    Symptom: You are trying to extract the archived files off of a huge (any file with size > 2 GB or 4GB, depending on the OS) ZIP file with...
  • JDS: Installing Sun Java Desktop System 2.0
    This document will guide you through the process of installing JDS 2.0 on a PC from integrated CDROM images Requirements I...
  • Linux: Installing Source RPM (SRPM) package
    RPM stands for RedHat Package Manager. RPM is a system for installing and managing software & most common software package manager used ...
  • Solaris: malloc Vs mtmalloc
    Performance of Single Vs Multi-threaded application Memory allocation performance in single and multithreaded environments is an important a...
  • C/C++: Printing Stack Trace with printstack() on Solaris
    libc on Solaris 9 and later, provides a useful function called printstack , to print a symbolic stack trace to the specified file descripto...
  • Installing MySQL 5.0.51b from the Source Code on Sun Solaris
    Building and installing the MySQL server from the source code is relatively very easy when compared to many other OSS applications. At least...
  • Oracle Apps on T2000: ORA-04020 during Autoinvoice
    The goal of this brief blog post is to provide a quick solution to all Sun-Oracle customers who may run into a deadlock when a handful of th...
  • Siebel Connection Broker Load Balancing Algorithm
    Siebel server architecture supports spawning multiple application object manager processes. The Siebel Connection Broker, SCBroker, tries to...
  • 64-bit dbx: internal error: signal SIGBUS (invalid address alignment)
    The other day I was chasing some lock contention issue with a 64-bit application running on Solaris 10 Update 1; and stumbled with an unexpe...
  • Oracle 10gR2/Solaris x64: Fixing ORA-20000: Oracle Text errors
    First, some facts: * Oracle Applications 11.5.10 (aka E-Business Suite 11 i ) database is now supported on Solaris 10 for x86-64 architectur...

Categories

  • 80s music playlist
  • bandwidth iperf network solaris
  • best
  • black friday
  • breakdown database groups locality oracle pmap sga solaris
  • buy
  • deal
  • ebiz ebs hrms oracle payroll
  • emca oracle rdbms database ORA-01034
  • friday
  • Garmin
  • generic+discussion software installer
  • GPS
  • how-to solaris mmap
  • impdp ora-01089 oracle rdbms solaris tips upgrade workarounds zombie
  • Magellan
  • music
  • Navigation
  • OATS Oracle
  • Oracle Business+Intelligence Analytics Solaris SPARC T4
  • oracle database flashback FDA
  • Oracle Database RDBMS Redo Flash+Storage
  • oracle database solaris
  • oracle database solaris resource manager virtualization consolidation
  • Oracle EBS E-Business+Suite SPARC SuperCluster Optimized+Solution
  • Oracle EBS E-Business+Suite Workaround Tip
  • oracle lob bfile blob securefile rdbms database tips performance clob
  • oracle obiee analytics presentation+services
  • Oracle OID LDAP ADS
  • Oracle OID LDAP SPARC T5 T5-2 Benchmark
  • oracle pls-00201 dbms_system
  • oracle siebel CRM SCBroker load+balancing
  • Oracle Siebel Sun SPARC T4 Benchmark
  • Oracle Siebel Sun SPARC T5 Benchmark T5-2
  • Oracle Solaris
  • Oracle Solaris Database RDBMS Redo Flash F40 AWR
  • oracle solaris rpc statd RPC troubleshooting
  • oracle solaris svm solaris+volume+manager
  • Oracle Solaris Tips
  • oracle+solaris
  • RDC
  • sale
  • Smartphone Samsung Galaxy S2 Phone+Shutter Tip Android ICS
  • solaris oracle database fmw weblogic java dfw
  • SuperCluster Oracle Database RDBMS RAC Solaris Zones
  • tee
  • thanksgiving sale
  • tips
  • TomTom
  • windows

Blog Archive

  • ►  2013 (16)
    • ►  December (3)
    • ►  November (2)
    • ►  October (1)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (1)
    • ►  May (1)
    • ►  April (1)
    • ►  March (1)
    • ►  February (2)
    • ►  January (1)
  • ►  2012 (14)
    • ►  December (1)
    • ►  November (1)
    • ►  October (1)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (2)
    • ►  May (1)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (2)
  • ►  2011 (15)
    • ►  December (2)
    • ►  November (1)
    • ►  October (2)
    • ►  September (1)
    • ►  August (2)
    • ►  July (1)
    • ►  May (2)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (1)
  • ►  2010 (19)
    • ►  December (3)
    • ►  November (1)
    • ►  October (2)
    • ►  September (1)
    • ►  August (1)
    • ►  July (1)
    • ►  June (1)
    • ►  May (5)
    • ►  April (1)
    • ►  March (1)
    • ►  February (1)
    • ►  January (1)
  • ►  2009 (25)
    • ►  December (1)
    • ►  November (2)
    • ►  October (1)
    • ►  September (1)
    • ►  August (2)
    • ►  July (2)
    • ►  June (1)
    • ►  May (2)
    • ►  April (3)
    • ►  March (1)
    • ►  February (5)
    • ►  January (4)
  • ►  2008 (34)
    • ►  December (2)
    • ►  November (2)
    • ►  October (2)
    • ►  September (1)
    • ►  August (4)
    • ►  July (2)
    • ►  June (3)
    • ►  May (3)
    • ►  April (2)
    • ►  March (5)
    • ►  February (4)
    • ►  January (4)
  • ►  2007 (33)
    • ►  December (2)
    • ►  November (4)
    • ►  October (2)
    • ►  September (5)
    • ►  August (3)
    • ►  June (2)
    • ►  May (3)
    • ►  April (5)
    • ►  March (3)
    • ►  February (1)
    • ►  January (3)
  • ►  2006 (40)
    • ►  December (2)
    • ►  November (6)
    • ►  October (2)
    • ►  September (2)
    • ►  August (1)
    • ►  July (2)
    • ►  June (2)
    • ►  May (4)
    • ►  April (5)
    • ►  March (5)
    • ►  February (3)
    • ►  January (6)
  • ▼  2005 (72)
    • ►  December (5)
    • ►  November (2)
    • ►  October (6)
    • ►  September (5)
    • ▼  August (5)
      • An Odyssey to Solaris 11 on Solaris Express 17
      • Solaris 10: Recovering from a Runtime Linker Failu...
      • Movie Spoilers
      • Sun Studio 9: RR or GA?
      • Sun Studio C/C++: Annotated listing (compiler comm...
    • ►  July (10)
    • ►  June (8)
    • ►  May (9)
    • ►  April (6)
    • ►  March (6)
    • ►  February (5)
    • ►  January (5)
  • ►  2004 (36)
    • ►  December (1)
    • ►  November (5)
    • ►  October (12)
    • ►  September (18)
Powered by Blogger.

About Me

Unknown
View my complete profile