Extracting Symmetrix Data with Orca

The Symmetrix is EMC's flagship storage device. It's big. It's bad. And it can sometimes be a bit opaque, meaning that performance data can be difficult to extract. EMC produces a range of tools -- some graphical, and some that can be called from the command line -- but using these to generate long-term performance statistics, in a management-friendly format, can be a bit of a struggle.


February 04, 2004
URL:http://www.drdobbs.com/extracting-symmetrix-data-with-orca/199101802

Extracting Symmetrix Data with Orca

Tom Kranz

The Symmetrix is EMC's flagship storage device. It's big. It's bad. And it can sometimes be a bit opaque, meaning that performance data can be difficult to extract. EMC produces a range of tools -- some graphical, and some that can be called from the command line -- but using these to generate long-term performance statistics, in a management-friendly format, can be a bit of a struggle.

Orca is an open source package by Blair Zajac that uses RRDTool to plot arbitrary data from text files. Traditionally, Orca (http://www.orcaware.com/orca) has been used in conjunction with the SymbEL Toolkit to plot Solaris performance data. In this article, I will show how to use EMC's command-line tools, in conjunction with Orca, to plot performance statistics for your Symmetrix.

For purposes of this article, I'm assuming that you have rsync, ssh, and some sort of Web server installed -- these should form part of every sys admin's toolbox. You'll also need SymmCLI installed on the host connected to the Symmetrix (I have version 4.1). You'll also need Orca already installed and configured; I highly recommend it merely for the out-of-the-box Solaris performance monitoring it gives.

About Orca

A process (called orcallator) runs on a host and logs data into a file. Using rsync, this data file can then be copied to a central server, where Orca is run to process the data. Orca places that data into RRDtool databases, and then generates graphs from those databases. Orca works out averages and percentage increases by using the date stamps, which are written to each line of data in the orcallator output file. The graphs and output files are then placed in a directory where your Web server can see them, and voilà! Your performance data is available via a Web browser. See Listing 1.

Collecting Data from Symmetrix

The trick here is to pull performance data from the Symmetrix and then log that in a file that Orca can understand. symmstat is the SymmCLI tool that queries the Symmetrix and returns performance statistics. Listing 2 (orca-symmetrix_probe.pl) is a Perl script that calls symmstat, parses the output, and then generates a data file that Orca can understand.

We'll collect all the performance output from symmstat -- it's all useful, and is best looked at all together rather than focusing on one statistic (e.g., writes per second).

symmstat will provide the following data:

A normal I/O read is when one block of data is requested and received. A sequential read is when we request several blocks of data, but they are all next to each other. The disk head can position itself, then just read several blocks, one after the other, which is an efficient way to get data off the disks. Write pending (wp) tracks (Figure 4) the number of tracks that the Symmetrix is sitting on; it's waiting for a drop in I/O before it stages them out of cache and down to disk.

One of the big advantages of getting Orca to graph these figures is that we can easily look at them together, over time. For instance, a high number of read I/Os, combined with a large amount of kilobits being read, with high numbers of sequential reads, could be the normal result of large table scans in your Oracle database (which might not be a problem).

A large number of small writes that aren't hitting the cache (large number of I/O writes, small value for kb writes, and small number of cache writes) will mean that your overall I/O performance will be suffering, with the Symmetrix constantly having to write to disk. You'd need to look at what sort of I/O is taking place, or maybe invest in some more cache.

Note that you can use this information to get an I/O "footprint" of your applications and then, using the historic graphing of Orca, you get an idea of when things are ok, and when they are not. You can also use the Orca graphs for capacity planning or gauging what impact a new software release will have.

It's worthwhile keeping an eye on the amount of write pending tracks. Compare it to the amount and size of the write I/Os occurring. Ideally, you want the Symmetrix to be busy, but not so busy that it gets overwhelmed.

When pulling the data from the Symmetrix, there are two main things to remember:

1. We need the epoch date (Orca uses this when working out averages over time).

2. If symmstat returns nothing for a device (which it can do regularly), we need to trap this and put a 0 for that value in the output file.

orca-symmetrix_probe.pl (Listing 2) takes symmstat data and creates Orca-friendly output files. It should be called from root's crontab every 5 minutes. Once it's running, you should notice the output file being generated and updated in /var/log/probe.

I have been using this script against a Symmetrix 3630. If you have a different breed of Symmetrix, you might notice that symmstat reports slightly different information. In that case, you'll need to edit the script to suit.

Once we're gathering data, we need a mechanism to copy those output files back to the central Orca machine for processing. Listing 3 (orca-emc_getlogs.ksh) should be run from the Orca user's crontab every 30 minutes. Be sure its execution doesn't coincide with the orca-symmetrix_probe.pl script (Listing 2), otherwise you'll be copying an incomplete data file.

Essentially, this script calls rsync to copy across the data file, and then calls orca to process that data. I'm using rsync over ssh because of the secure password-less connection that ssh offers, and because it ensures we only copy updated data, vastly reducing network traffic.

When we've called Orca from the script, we're passing it the enclosed config file, which tells Orca what values to take from the data files, and how to plot them. With this setup, you should now be able to launch a Web browser and view the performance graphs for your Symmetrix.

Whether the information you gather is sufficient to help you secure an extra gigabyte of cache is another matter, but at least you know how the Symmetrix is performing.

My thanks to Mick Sheppard for his invaluable help with the intricacies of Perl.

Tom Kranz is a contract sys admin with more than 7 years of experience. His main skills lie with Solaris, IRIX, and SANs. When he's not stuck in front of his Octane, he can be found herding his children round in a classic Rover.

Figure 1 Yearly I/O writes

Figure 2 Yearly KB read

Figure 3 Yearly cache writes

Figure 4 Yearly write pending tracks

Listing 1 Configuration file

# Orca configuration file for orcallator files.
#
# This config file is for processing Symmetrix performance logs
# Tom Kranz - [email protected]

base_dir        /usr/local/orca/var/orca/rrd/orcallator
rrd_dir            .
state_file        symmetrix.orca.state
html_dir        /usr/local/orca/html/symmetrix
expire_images        1
find_times        0:10 1:00 6:00 12:00 19:00
# This defines who gets warning emails - set to your address
warn_email        <your_sysadmin_email_address>
late_interval        interval + 30

# Here we define the stats group and the data to plot
group symmetrix {
find_files        /usr/local/orca/logs/symmetrix/probe-\d{4}-\d{2} \
                  -\d{2}(?:-\d{3,})?(?:\.(?:Z|gz|bz2))?
column_description    first_line
date_source        column_name timestamp
interval        300
}

html_top_title        Symmetrix stats

html_page_header
    <font face="Arial,Helvetica">
        Symmetrix performance stats
    </font>

html_page_footer
    <font face="Arial,Helvetica">
        These plots brought to you by your local system administrator.
    </font>

plot {
title            I/O Reads
source            symmetrix
data            (.*\d)ioread
legend            $1
y_legend        I/O Read/sec 
data_type        guage
required        1
plot_min        0
line_type        line2
}

plot {
title            I/O Writes
source            symmetrix
data            (.*\d)iowrite
legend            $1
y_legend        I/O Writes/sec
data_type        guage
required        1
plot_min        0
line_type        line2
}

plot {
title            KB read
source            symmetrix
data            (.*\d)kbread
legend            $1
y_legend        KB Reads/sec
data_type        guage
required        1
plot_min        0
line_type        line2
}

plot {
title            KB written
source            symmetrix
data            (.*\d)kbwrite
legend            $1
y_legend        KB written/sec
data_type        guage
required        1
plot_min        0
line_type        line2
}

plot {
title            Cache Reads
source            symmetrix
data            (.*\d)cacheread
legend            $1
y_legend        Cache reads/sec
data_type        guage
required        1
plot_min        0
line_type        line2
}

plot {
title            Cache Writes
source            symmetrix
data            (.*\d)cachewrite
legend            $1
y_legend        Cache Writes/sec
data_type        guage
required        1
plot_min        0
line_type        line2
}

plot {
title            Sequential Reads
source            symmetrix
data            (.*\d)seqread
legend            $1
y_legend        Sequential Reads
data_type        guage
required        1
plot_min        0
line_type        line2
}

plot {
title            Write Pending Tracks
source            symmetrix
data            (.*\d)wptracks
legend            $1
y_legend        Write Pending Tracks
data_type        guage
required        1
plot_min        0
line_type        line2
}

Listing 2 Orca-symmetrix_probe.pl

#!/usr/local/bin/perl
#
# Tom Kranz - [email protected]
#
# With thanks to Mick Sheppard for his Perl skillz!
#
# This script calls symstat, pulls the required data out from it's 
# output and then prints that to a log file
# That is then copied to Orac via rsync for plotting by Orca


# We need to setup several date variables first
($Second,$Minute,$Hour,$Month_Day,$Month,$Year,$Week_Day,$IsDST) = \
  (localtime)[0
,1,2,3,4,5,6,8];
$realyear=$Year+1900;
$realmon=$Month+1;
$monlength=length($Month_Day);
if ( $monlength == 1 ) {
        $Month_Day="0$Month_Day";
};
$realmonlen=length($realmon);
if ( $realmonlen == 1 ) {
    $realmon="0$realmon";
};

# We need a nice ISO format date for the log file, and we also need an 
# epoch timestamp for the first data column - Orca can then use this 
# to track updates
$gdate="$realyear-$realmon-$Month_Day";
$epochtime=time;

# Here's our output file
open(PROBEOUT,">>/var/log/probe/probe-$gdate");

# Now we have some arrays to create the column headings
# Note that volumes 00A and 00B are listed as 010 and 011
# Orca doesn't plot them otherwise
@logicaldisks=("001", "002", "003", "004", "005", "006", "007", \
               "008", "009", "010", "011");
@columns=("ioread", "iowrite", "kbread", "kbwrite", "cacheread", \
          "cachewrite", "seqread", "wptracks");

# Now we need to print the column headings to the output file
print PROBEOUT "timestamp ";
foreach $name (@logicaldisks) {
    foreach $column (@columns) {
        print PROBEOUT "$name$column ";
    };
};
print PROBEOUT "\n";

# Now we add our timestamp for Orca to keep track of when new data 
# was added
print PROBEOUT "$epochtime ";

# And now we take in the results from the symstat command, and do 
# some munging
# before printing them out to the output file
foreach $dev (@logicaldisks) {
    # Note: here we need to change the volume IDs to hex so that we can
    # spot them from the output of symstat
    if ( $dev == "010" ) {
        $dev="00A";
    } elsif ( $dev == "011" ) {
        $dev="00B";
    }
    # Now we start reading in the data from symstat
    $probe=open(PROBEIN,"/opt/emc/SYMCLI/4.0.1/bin/symstat -i 10 \
                -c 2 -dev $dev|");

    my $found = 0 ;
    while( <PROBEIN> ) {
        chomp;
        if( /([0-9A-F]{3})\s+\(rdmp\/(\w+)\*\)(.*)$/ ) {
            $found = 1;
            $volumeID = $1 ;
            $devPath = $2 ;
            @results = split( /\s+/, $3 ) ;
            for( $idx = 0 ; $idx <= $#results ; $idx ++ ) {
                if( $results[ $idx ] eq "N/A" ) {
                    $results[ $idx ] = 0 ;
                }
            }
            print PROBEOUT join( ' ', @results ) ;
            last ;
        }
    }
    close( PROBEIN ) ;

    # symstat often doesn't come back for all volumes, so we need to pad
    # the data with nulls
    if ( $found == 0 ) {
        print PROBEOUT "0 0 0 0 0 0 0 0" ;
    }
    # Just in case we're missing a space somewhere between data sets 
    # on the line - Orca isn't fazed (it appears to be matching on \s+)
    print PROBEOUT " " ;
}

# And we end the line, ready for the next round of column headings
print PROBEOUT "\n";

# Always tidy up after yourself
close(PROBEOUT);

Listing 3 Getlogs

#!/bin/ksh
#
# Tom Kranz - [email protected]
#
# getlogs
#    Get the log files from the server with the Symm and run 
#    Orca to generate the performance HTML pages.

# Running into limits on open files, so ....
ulimit -n 1024

# Use rsync to copy across the files
# NOTE: change <your_symmetrix_host> to the IP of the remote host
# that's connected to the Symmetrix
/usr/local/bin/rsync -v -a --rsh=/usr/local/bin/ssh \
    --rsync-path=/usr/local/bin/rsync \
    --timeout=60 \
    <your_symmetrix_host>:/var/log/probe/ \
    /usr/local/orca/logs/symmetrix

# Run ORCA
/usr/local/orca/bin/orca -o /usr/local/orca/lib/symmetrix_orcallator.cfg

# Copy the files across to your live html directory
# NOTE: change <your_live_html_dir> to the directory where your web
# server looks for it's html files
cp -rp /usr/local/orca/html/symmetrix/* <your_live_html_dir>

# We're done!
exit 0

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.