Thursday, 3 February 2005

Solaris 9 or later: More performance with Large Pages (MPSS)

3 simple steps to improve the performance of any native application on Solaris 9 or later versions:



  1. Run the application; and collect trapstat data with maximum load on the system

    trapstat -T 10 10


    Check the %time spent on dTLB misses


  2. Preload mpss.so.1 interposing library of Solaris and configure the application to use large pages. This can be done by writing a simple wrapper wround the invokation of application


    You can check the supported page sizes on your machine by typing "pagesize -a". Common page sizes on SPARC systems: 8K (default), 64K, 512K, 4M



    Wisely choose the page size for the application; else lot of resources may get wasted thereby degrading the performance of the system



    Create the wrapper script as follows Or add the following lines (upto export MPSSCFGFILE ..) to your script, if you have one:


    #!/bin/ksh

    LD_PRELOAD=mpss.so.1

    MPSSCFGFILE=/tmp/mpsscfg

    MPSSERRFILE=/tmp/mpsserr



    export MPSSCFGFILE MPSSERRFILE LD_PRELOAD



    exec <application name> <args to application>


    Then create a simple configuration for MPSS:


    eg., If myapp is the name of the application, the following line creates the mpss config file to let the application use 4M pages for heap (default: 8K pages) and a 64K stack (default: 8K)


    echo "myapp*:4M:64K" > /tmp/mpsscfg


  3. Finally, run the application by executing the wrapper script. And collect the trapstat statistics and measure the difference in performance


With the help of large pages, the application's performance may improve due the reduced number of dTLB misses.



eg.,

The following data was collected by running Siebel with default 8K pages on Sun's v480 server:


sdcv480s002:/export/home/sunperf/perf_tools/%grep ttl trapstat-vanilla.txt

ttl | 918305 5.2 9363 0.4 | 1148524 8.0 66553 3.2 |16.7

ttl | 990784 5.6 9888 0.4 | 1202256 8.4 67298 3.3 |17.6

ttl | 960221 5.4 9764 0.4 | 1192122 8.3 68607 3.3 |17.5

ttl | 982697 5.6 9934 0.4 | 1232264 8.5 69221 3.3 |17.8

ttl | 1007827 5.7 10295 0.4 | 1273141 8.8 72519 3.5 |18.5

ttl | 1011441 5.7 10031 0.4 | 1222785 8.5 69450 3.4 |18.1

ttl | 961155 5.4 9469 0.4 | 1191395 8.2 65668 3.2 |17.2

ttl | 1019467 5.8 11088 0.5 | 1265553 8.9 77352 3.8 |18.9

ttl | 1009262 5.7 10638 0.4 | 1276510 8.9 74925 3.6 |18.7

ttl | 1021536 5.8 10554 0.4 | 1280768 8.9 72188 3.5 |18.6


The following data shows the reduced number of dtlb, dtsb misses with 4M pages:


sdcv480s002:/export/home/sunperf/perf_tools/%grep ttl 771mpss-trapstat.txt

ttl | 1497319 8.4 1082 0.0 | 131236 1.1 2577 0.1 | 9.7

ttl | 1305635 7.4 1020 0.0 | 117982 1.0 2483 0.1 | 8.5

ttl | 1626490 9.2 1028 0.0 | 145789 1.2 2754 0.1 |10.6

ttl | 1424718 8.0 1063 0.0 | 130317 1.1 2665 0.1 | 9.3

ttl | 1411515 8.0 982 0.0 | 126710 1.1 2532 0.1 | 9.2

ttl | 1443108 8.1 925 0.0 | 128753 1.1 2577 0.1 | 9.4

ttl | 1512549 8.5 1037 0.0 | 131343 1.1 2518 0.1 | 9.8

ttl | 1246909 7.0 834 0.0 | 107194 0.9 2299 0.1 | 8.1

ttl | 1387027 7.8 1004 0.0 | 126112 1.1 2636 0.1 | 9.1

ttl | 1477135 8.3 989 0.0 | 126294 1.1 2477 0.1 | 9.6


And the application with 4M pages, performed nearly 7% better compared to the vanilla run

No comments:

Post a Comment