Posts Tagged ‘event’

Total Event Count in TBSM

Tuesday, April 26th, 2016

Introduction

Tivoli Business Service Manager can calculate amazing things for you, if you only need them. This is thanks to the powerful rules engine being the key part of TBSM as well as the Netcool/Impact policies engine running just under the hood together with every TBSM edition. You can present your calculation results later on on a dashboard or in reports, depending if you think of a real time scorecard or historical KPI reports.

In this article, I’ll show how to calculate a total event count throughout multi-level service tree. It is something that TBSM isn’t doing right after a fresh install because it doesn’t provide you with the right rules out of the box, however TBSM doesn’t have also any predefined service tree available to you so in order to see this working you’d need to do both: add the rules to your service templates and import or create by hand your service tree structure to test this.

In this material, I’ll create a simple, multi-level service tree consisting of 3 levels of instances and I’ll use my own defined template T_Regions, but in order to repeat this exercise you can also simply reuse the template SCR_ServiceComponentRawStatusTemplate, which comes with every TBSM installation and is widely used in integrations with Tivoli Application Dependency Discovery Manager (TADDM). They key thing is that your template is to:

  • Have at least 1 incoming status rule
  • Be in use across the whole service tree, so all service instances on all levels in your service tree implement that template.

totaleventcount_pic_01Figure 1. Template settings for this excercise

totaleventcount_pic_02Figure 2. Incoming Status Rule body used in this excercise

totaleventcount_pic_03Figure 3. Simple service tree used in this material

Make note. This document is trying to implement already existing functionality, means calculating the total number of events on every service tree level which result is stored in numRawEventsInt parameter. This parameter can be visible as the last value in the RAD_prototype widget being used typically on Custom Canvases on TBSM dashboards created in Tivoli Integrated Portal. But that parameter value isn’t accessible for numerical rules or policies for further processing.

totaleventcount_pic_04Figure 4. numRawEventsInt value used on RAD_prototype widget

The newest add-on to TBSM, the debug Spy tools, also offer a parameter per every service tree level, called Matching Events. However that value is correct too, it also isn’t accessible from numerical rules or policies.

Make note. There is a BSM Accelerator template, called BSMAccelerator_EventCount which was designed to present the correct number of events for every service instance, however it was tailored to BSM Accelerator needs and service tree structure and isn’t scalable for potentially endlessly high service trees. However, some of the concepts introduced in order to support the BSM Accelerator package, will be covered in this document. If you want to read more, see this document:

http://www.ibm.com/support/knowledgecenter/SSSPFK_6.1.1.3/com.ibm.tivoli.itbsm.doc/bsma/10/bsm_tbsm_BSMAcceleratorEventCount_rules.html

Make note. TBSM 6.1.1 FP3 is a prerequisite for all rules described in this material to work correctly. However it is highly recommended to install Fix Pack 4 or higher for ensuring the latest improvements.

What is the multi-level events count?

TBSM runs an Impact service called TBSMOMNIbusEventReader which comes with the product out of the box and is responsible for reading events in Netcool/OMNIbus on a regular basis (every 3000 miliseconds by default) and finding events to be processed by TBSM by using its special predefined filter.

Here’s the default filter:

(Class <> 12000) AND (Type <> 2) AND ((Severity <> RAD_RawInputLastValue) or (RAD_FunctionType = ”)) AND (RAD_SeenByTBSM = 0) AND (BSM_Identity <> ”)

All events which pass that filter get processed further by TBSM service template rules, actually their special kind called Incoming Status Rules. The most typical incoming status rule, predefined inside SCR_ServiceComponentRawStatusTemplate template, called ComponentRawEventStatusRule has a precondition, called a discriminator, which filters out all events filtered in previously by the event reader, which don’t have one of the following classes:

  • TPC Rules(89200),
  • IBM Tivoli Monitoring Agent(87723),
  • Predictive Events(89300),
  • IBM Tivoli Monitoring(87722),
  • Default Class(0),
  • TME10tecad(6601),
  • Tivoli Application Dependency Discovery Manager(87721),
  • Precision [Start](8000),
  • MTTrapd(300),
  • Precision [End](8049)

Make note. In my example my Incoming Status rule will simply expect just Default Class (0) in all my test events.

This is not the end. There’s one more filter. It is called event identification field and by default TBSM will look for its value in event’s field called BSM_Identity. Value that is expected in that field comes from every service instance event identifier, which by default is the same as service instance name. So the event identifiers for my simple service tree will be the following:

Service instance name Event identifier
Europe Europe
Poland Poland
Malopolska Malopolska

I will not discuss in this material about how to maintain event identifiers, how many event identifiers you can have, how to set up event identifiers in XMLtoolkit configuration files (if you’re interested in those topics, please see my private blog entry: http://www.marcinpaluch.pl/wordpress/?p=231). I will also not discuss here on how the event severity may affect service instance status, I go defaults here in my example, but I will not focus on that area in this material at this time.

To sum it up: there are 3 filters your event has to pass before it affects your service instance:

  1. The TBSMOMNIbusEventReader’s filter
  2. The Incoming Status Rule discriminator / event class filter
  3. The event identifier

If your event made it through all the filters, you can call it a service instance affecting event.

It doesn’t have to mean your event has to change your service instance status, it only means that your event was processed by the Incoming Status Rule implemented in your service instance’s template. If you use TBSM 6.1.1 FP4, you can use Service Model Spy tool to see that your Incoming Status rule updated various attributes like Matching Events (number), Max Event Status (Event’s severity) and a timestamp of time when the rule processed the event.

The Matching Events parameter is what I’ll be calling in this material the EventCount.

Now, why Multilevel event count?

Every service instance can have its own individual EventCount. Every level of the service tree can contain more than one service instance and the best way to sum them up is to calculate their sum on their parent level. Then the parent service instance may also be used to implement a template with Incoming Status rule and therefore it can have its own individual EventCount. And then the parent service instance can be one of many parent service instances so the best way of summing them up would be calculating TotalEventCount on the grandparent service instance level. And so on. So the Multi-level event count is a feature to calculate the total number of events being processed by TBSM in the whole service tree.

Why would you need it? There are several use cases possible:

  • Your service tree consistency check and verification – in a development phase, to see if all levels of your service tree get processed correctly
  • Statistics – to see the current and true load on TBSM by source, class, alert type, any event field in order to perform some further analysis of event storms and their reasons
  • To monitor the operations – for example to compare total events count to total acknowledged events count to total count of events escalated by opening an incident etc.
  • To monitor service component qualities – especially important in case of service components are managed or provided by a 3rd party provider – you can assess how much trouble all of them give your company or your operations team

Once the use case is agreed, you may want to use this material to start collecting your Total event counts in order to present them on a dashboard or in a report. Let me now explain to you how to set it up.

 

Implementation

As the first step let’s make sure I’m collecting the event count for each of my service tree elements. Let me create my new rule: OwnEvents count.

Make note. This step has a prerequisite: I need to have my Incoming Status rule already created.

This is perhaps not well documented, but every Incoming Status Rule can be used in a Numerical Formula rule to get the number of events processed. It is documented in this technote:

http://www-01.ibm.com/support/docview.wss?uid=swg21626616

So let me do exactly what the technote does, this is my numerical formula, my rule called OwnEvents, which will return only non-clear events count via the default (since TBSM 6.1.1 FP1) Incoming Status Rule’s parameter NumEventsSevGE2. Whenever my Incoming Status Rule has processed another event with severity 1 or higher, the output of my numerical formula will refresh and increase by 1.

totaleventcount_pic_05Figure 5. OwnEvents rule settings

And on my scorecard:

totaleventcount_pic_06Figure 6. OwnEvents in a scorecard

Let’s send a test event to the last level now:

totaleventcount_pic_07Figure 7. Sending test event

totaleventcount_pic_08Figure 8. Test event settings

totaleventcount_pic_09Figure 9. OwnEvents after sending test event

As you could see the events severity was passed through the whole service tree up, that is why the icon in the Events column changed color to Purple from bottom level right to the top one.

After sending a critical event to the 2nd level the icons from the 2nd level to the top one changed their color to red.

totaleventcount_pic_10Figure 10. OwnEvents after sending 2nd test event

Make note. In order to perform this exercise, I haven’t created a status propagation rule. And I will not!

Take a look at the OwnEvents column. Even if status was propagated through the service tree from bottom to the top, the OwnEvents rule worked for every level individually. Europe shows bad Events noticed but OwnEvents column shows 0 events affected that level.

Now, let’s try to make every level aware of events happening on the level below it.

Prepare such a policy:

/* trigger_totalevents */
log(“Triggered: “+ServiceInstance.STATEMODELNODE.trigger_totalevents.Value);Status = 0;

si = ServiceInstance.SERVICEINSTANCENAME+” (“+ServiceInstance.DISPLAYNAME+”)”;

if(ServiceInstance.STATEMODELNODE.count_ownevents.Value <> NULL) {
Status =  Int(ServiceInstance.STATEMODELNODE.count_ownevents.Value);
}

log(“Service instance: “+si+” own events count: “+Status);

i = 0;
while (ServiceInstance.CHILDINSTANCEBEANS[i] <> NULL) {
ci = ServiceInstance.CHILDINSTANCEBEANS[i].SERVICEINSTANCENAME+” (“+ServiceInstance.CHILDINSTANCEBEANS[i].DISPLAYNAME+”)”;

if(ServiceInstance.CHILDINSTANCEBEANS[i].NUMCHILDREN > 0) {
grandChildEvents = 0;

if(ServiceInstance.CHILDINSTANCEBEANS[i].STATEMODELNODE.count_totalevents.Value <> NULL) {
grandChildEvents = Int(ServiceInstance.CHILDINSTANCEBEANS[i].STATEMODELNODE.count_totalevents.Value);
}
log(“Service instance: “+si+”, child: “+ci+” children events: “+grandChildEvents);

Status = Status + grandChildEvents;
} else {

childOwnEvents = 0;
if(ServiceInstance.CHILDINSTANCEBEANS[i].STATEMODELNODE.count_ownevents.Value <> NULL) {
childOwnEvents = Int(ServiceInstance.CHILDINSTANCEBEANS[i].STATEMODELNODE.count_ownevents.Value);
}
log(“Service instance: “+si+”, child: “+ci+” own events: “+childOwnEvents);

Status = Status + childOwnEvents;

log(“Service instance: “+si+”, child: “+ci+” children events: “+childOwnEvents);
}

i = i + 1;
}

log(“Service instance: “+si+” total events count: “+Status);

I called this policy count_totalevents_policy_1 and I saved it within numerical formula rule, called count_totalevents.

totaleventcount_pic_11Figure 11. TotalEvents rule settings

Same time, create another, numerical aggregation rule, in which you will point to the just created rule within the same template. Make sure you name your rule exactly same way as indicated in the header of the policy in the numerical formula just created a moment ago.

totaleventcount_pic_12Figure 12. TriggerTotalEvents rule settings

You should have by the end the following list of rules in your template:

totaleventcount_pic_13Figure 13. T_Regions template complete rules set

Make note. After creating a template rule pointing to the same template as a child template, the template will disappear from the templates list in the service navigator portlet. In order to fix it, add that template to any other template by associating via any type of status propagation rule:

totaleventcount_pic_14Figure 14. T_Regions template associated to templateFinder

And this is the result that should occur at the end in your scorecard:

totaleventcount_pic_15Figure 15. TotalEvents column in a scorecard

It looks like the concept works fine. Let’s try it further. Let’s send another event from every level, starting from Malopolska to Poland and to Europe.

totaleventcount_pic_16Figure 16. TotalEvents column after sending more test events

It looks correct, every level OwnEvent count increased by 1 and I have in total 5 events in the entire tree, just 2 on the leaf, another 2 in the middle and just 1 on the root level.

Let’s add a new level below Malopolska and call it Krakow. This will simulate expanding the service tree i.e. in case of a fresh import from TADDM or CMDB.

totaleventcount_pic_17Figure 17. OwnEvents and TotalEvents after adding a new child service

Let’s now send a new event, Severity 3 to Krakow:

totaleventcount_pic_18Figure 18. OwnEvents and TotalEvents after sending a test event to the new child service

The new event affected Krakow and was included in all level calculations of the TotalEvents count correctly. Let’s now create one level above the all, called Earth:

totaleventcount_pic_19Figure 19. OwnEvents and TotalEvents after adding a new root service

Adding Earth didn’t change the TotalEvents count of course, but the current max was reflected on the new top/root level. Let’s send another event to Poland:

totaleventcount_pic_20Figure 20. OwnEvents and TotalEvents after sending test events to the new root service

The total event count increased by 1 again. Only Europe’s OwnEvents column value increased by 1.

Let’s now remove Krakow from the Leaf level to see if the TotalEvents count will decrease by 1 now:

totaleventcount_pic_21Figure 21. OwnEvents and TotalEvents after removing the child service from the tree

So it is correct again, after removing Krakow with its 1 event the overall TotalEvents count dropped by 1 too and equals now 6.

This is is, if you like this post, let me know, also in case of questions.

Take care to the next one!

mp

TBSM multiple event identifiers

Friday, August 14th, 2015

This blog entry will explain on how to set up and use multiple event identifiers in IBM Tivoli Business Service Manager 6.1.1 FP3 and previous releases.

 

Introduction

The official documentation doesn’t say much about multiple event identifiers.

http://www-01.ibm.com/support/knowledgecenter/SSSPFK_6.1.1.3/com.ibm.tivoli.itbsm.doc/ServiceConfigurationGuide/bsmu_srvt_edit_id_fields.html
http://www-01.ibm.com/support/knowledgecenter/SSSPFK_6.1.1.3/com.ibm.tivoli.itbsm.doc/ServiceConfigurationGuide/bsmu_isrt_create_good_marg_bad_rules.html
http://www-01.ibm.com/support/knowledgecenter/SSSPFK_6.1.1.3/com.ibm.tivoli.itbsm.doc/customization/bsms_dlkc_id_rules.html

 

So let’s do a quick summary of what we can:
– we can set multiple event identifiers in incoming status rule
– we can set multiple event identifiers in EventIdentifiersRules.xml or any artifact of category eventidentifiers in XMLtoolkit

 

But how to make sure they would match and do know they do match?

 

Solution

Multiple event identifiers in Incoming status rules are logically associated like there was logical AND operator between them:

pic1

 

 

 

 

 

 

 

pic2

 

 

 

 

 

 

 

 

 

So this definition of CAM_FailedRequestsStatusRule_TDW rule should be understood: all rows returned by CAM_RRT_SubTrans_DataFetcher will affect my service if data returned in the following fields has the following values:

APPLICATION=MyApp AND
SUBTRANSACTION=MySubTrans AND
TRANSACTIONS=MyTrans

 

Same time, if I have multiple values for same label, it means OR.
For incoming status rule like:

pic3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pic4

 

 

 

 

 

 

So my instance expects that CAM_BSM_Identity_OMNI rule can catch all events with two alternative BSM_Identity values:

– MyApp#t#MyTrans#s#MySubTrans OR
– MySubTrans(5D21DD108FD43941892543AA0872D0EA)-cdm:process.Activity

 

If we’re looking at EventIdentifierRules.xml, there’s a concept of policies and rules, for example:

<Policy name=”ITM6.x”>
<Rule name=”ManagedSystemName”>
<Token keyword=”ATTR” value=”cdm:ManagedSystemName”/>
</Rule>
</Policy>
<Mapping policy=”ITM6.x” class=”%” />

You can have many policies mapped on many classes (which can be mapped on many templates) and you can have many rules within every policy.

 

In our case, for ITCAM Tx subtransactions class we have one policy with many rules:

                         <Policy name=”CAM_SubTransaction_Activity”>
<Rule name=”CAM_GetBSM_Identity”>
<Token keyword=”ATTR” value=”cdm:ActivityName” />
<condition operator=’like’ value=’%#s#%’ />
</Rule>
<Rule name=”CAM_GetApplicationName” field=”APPLICATION”>
<Relationship relationship=’cdm:uses’
relationshipSource=’cdm:process.Activity’>
<Relationship relationship=’cdm:federates’
relationshipSource=’cdm:process.BusinessProcess’>
<Token keyword=”ATTR” value=”cdm:ActivityName” />
</Relationship>
</Relationship>
</Rule>
<Rule name=”CAM_GetTransactionName” field=”TRANSACTIONS”>
<Relationship relationship=’cdm:uses’
relationshipSource=’cdm:process.Activity’>
<Token keyword=”ATTR” value=”cdm:Label” />
</Relationship>
</Rule>
<Rule name=”CAM_GetSubTransactionName” field=”SUBTRANSACTION”>
<Token keyword=”ATTR” value=”cdm:Label” />
<condition operator=’like’ value=’%#s#%’ />
</Rule>
</Policy>

 

and one mapping of that policy on a class:
<Mapping policy=”CAM_SubTransaction_Activity” class=”cdm:process.Activity” />

 

But one class has many policies mapped on them:
<Mapping policy=”CAM_Transaction_Activity” class=”cdm:process.Activity” />
<Mapping policy=”CAM_SubTransaction_Activity” class=”cdm:process.Activity” />
<Mapping policy=”CAM_TT_Object” class=”%” />

 

Means, every mapping of a policy on a class is like element of logical OR operation. And every rule is a logical element of logical AND operation with other rules within same policy.

It is all conditional, because here comes additional aspect of field parameter of <Rule> tag.

 

The field parameter.

The field parameter in rule in policy in EventIdentifier enables that rule will be used only in case of having such field with such a name also in Incoming status rule specified as service instance name field.

 

So there’s no AND operator between those rules in policy in EventIdentifierRules.xml which haven’t been specified in Incoming Status Rule in template.

 

On another hand, there won’t be any value assigned to service instance name fields selected in Incoming Status Rule in Template A if corresponding fields haven’t been configured in rules of policies mapped on class (mapped on Template A in CDM_TO_TBSM4x_MAP_Templates.xml) in EventIdentifierRules.xml/

 

Conclusion.

You need two places to go to and configure your event identifiers:
Templates and incoming status rules / numerical rules / text rules – Service Instance Name Fields

  1. XMLtoolkit artifact EventIdentifierRules.xml (or any custom artifact from category eventidentifiers) – field parameters in rules defined within policies

Additionally, don’t forget: your policies defined in eventidentifiers artifacts must be mapped on CDM or custom classes that have mapping definition stored in CDM_TO_TBSM4x_MAP_Templates.xml and map on the same template that has the incoming status rule (numerical/text rule) you want with the Service Instance Name fields you want.

Otherwise your events or KPIs fetcher in fetchers won’t affect your service tree elements and you will not be showing correct status or availability on dashboards and your outage reports will also miss data and will generate false monthly results!

Drop me a note if you experience any troubles reading this article or trying to apply what’s written here, thanks!

mp