- Lord Kelvin
I. Introduction:
Every so often, businesses ask their IT counterparts very general questions on the capacity of their current IT infrastructure especially as it relates to customer facing applications. While these questions are very vague, they nevertheless require a diligence on the part of the IT manager to get the answers. Since these questions come with a reasonable periodicity, it is prudent for the IT manager to devise a measurement strategy to baseline the capacity of IT assets. Quite a few IT managers think that creating a baseline is an expensive proposition requiring the services of high-end consultants from a top-notch consulting firm. Nothing could be further from the truth. There are simple and commonsensical ways to get the baseline measurements done. In this post, I discuss a simple and generalized methodology for measuring the baseline performance of IT infrastructure within an organization.
Developing a measurement baseline is usually the first phase of a performance analysis and optimization plan. This is akin to assessing the current state of affairs. As I have often said, “you cannot optimize if you do not know your current capability”. Therefore, the intent of creating a measurement baseline is to develop a benchmark of current usage for both hardware and software resources. Once you have a benchmark, one can then carry out additional performance analysis to determine optimum configuration and response times for various scenarios. That, of course, is a future topic for discussion.
II. Procedure:
At a high level, the following steps are required to develop a measurement baseline:
- Identify resources that need to be used and determine the type of role/work performed by each of the resources.
- Determine the workload characterization.
- Select appropriate objects and their associated counters for measuring performance.
- Automate data collection by creating logging schedules and upload data into a suitable format for subsequent analysis.
- Sample the data at the appropriate intervals.
- Generate descriptive statistics and line graphs to identify peaks and valleys of system usage.
- Present the findings using a format relevant to the discussions.
The following sections detail each of the above steps:
- Identify resources and Roles: An accurate measurement baseline of current use can only be developed by monitoring resources that are currently used to host software applications. The right approach is to use passive monitors on resources that are part of the current production configuration. Care must be taken not to disrupt your steady state of operations.
- Workload Characterization: The next task is to determine the characterization of the infrastructure workload by studying your clustering and load-balancing heuristics. Obviously, primary servers will have a greater hit ratio when compared to secondary servers. Understanding the topology of the environment is a key to an accurate analysis of current usage.
- Selection of Performance Counters: The advantage of most applications in today’s environment is that they are instrumented off the shelf. This means that the administrator rarely has to devise software widgets to accurately measure the performance of these applications. However, the success of an accurate baseline is indicated by the right selection of these performance counters. A selection of too many counters will yield a data overload which makes it very cumbersome to analyze using standard methods. On the other hand, selection of too few counters makes the point of the whole endeavor meaningless. A good and experienced administrator should be able to pick and choose the right mix of these counters.
- Data Collection: Automate the collection of data as much as possible. In all of the standard operating system environments, provision exists to automatically collect periodic data points and store them in text files. These text files can later be imported into spreadsheets or databases for the actual analysis. Periodically check to ensure that the data is indeed being collected and stored appropriately. It is also important to have these raw data files appropriately named with the date and time of the log data for easier stratification later. Failure to do so will almost surely lead to confusion and possibly erroneous conclusions.
- Sampling Period: Once you have the basic data collection framework in place, it is important to choose the correct sampling period. It is critical that the sampling period encompasses any cyclical nature of the business. For instance, there may be a heavy volume of transactions during the first Monday of the month or the last Friday of the month. Or it could be that the pattern of usage is different for each day of the week. To complicate matters further, there also may be different usage patterns during different times during the day.
- Descriptive Statistics: Decide on the descriptive statistics that are intended to be used for data analysis. It is important not to go overboard with your analysis unless it is a complex real-time trading system where every transaction is critical to the business success of the enterprise. For the most part, summary statistics such as mean, median and mode supplemented by basic “normal” analysis such as the standard deviation, confidence levels, etc. are generally sufficient.
- Presentation: All of the analysis will be for naught if one is not able to convince the decision makers on the accuracy and relevance of your findings. Therefore, use simple trend graphs, histograms, Pareto charts, and pie charts to convert the analyzed data into meaningful information.
The seven steps discussed above is a simple process that can be carried out with minimal costs. Almost organizations have access to some sort of spreadsheet software, desktop database software and rudimentary presentation and graphing software. The creation of a measurement framework for the first time will probably require some investments in terms of thinking through the process. Once the measurements are validated, calibration of the framework is usually straightforward and much less cumbersome.