Ever since I setup puppet, I've noticed that the memory just keeps growing. Searching on the web, I found that puppet has memory leaks. Some suggestions were to run puppet using cron.
I thought about it but I didn't want to do it. At first I thought about having Fabric restart puppet once a month. In theory, that should work. Then, when I was ready to put it in, all the possible problems of doing this popped in my head. So, I didn't proceed. What next??
Well, I thought, since we use Nagios to monitor all of our machines anyway, why not monitor the puppet process, get a rough estimate of its memory utilization and have Nagios automatically restart puppet if it hits a certain number? And so, here it is....
#!/usr/bin/perl
# Author: Gou
# Purpose: Checks and restarts puppet if the memory is too high
use strict;
my $default;
my $pcount;
my @Results;
my $memResults;
$default=10;
chomp($pcount=`ps aux | grep puppetd | grep -v grep | wc -l`);
if ($pcount < 2) {
@Results=split(" ",`ps aux | grep puppetd | grep -v grep`);
$memResults=$Results[3];
if ($memResults > $default) {
`sudo /etc/init.d/puppet restart`;
print "WARNING - Memory utilization of $memResults\% is too high, restarting puppet";
exit 1;
}
else{
print "OK - Memory utilization is $memResults\%";
exit 0;
}
}
else{
print "CRITICAL - Found $pcount puppet processes";
exit 2;
}
Friday, October 4, 2013
Thursday, August 1, 2013
Qlikview Nagios Plugin
Here is a very simple but very powerful DOS batch file for use as a Nagios Plugin to monitor Qlikview Tasks.
You will first need to install net-snmp and then a Nagios client like NSClient++. After that, you need to follow the Qlikview instructions on how to enable SNMP.
Once SNMP is enabled, run this script and pass the OID and the Task name. The results should look like:
OK: Job myjob status is Waiting
Refer to the Qlikview Docs for more information. Lastly, Enjoy!
@echo off
@setlocal enableextensions enabledelayedexpansion
REM Author : Gou
REM Date : 08/01/2013
REM Requires: net-snmp,NSClient++ and qlikview SNMP enabled
REM http://www.net-snmp.org/
REM
REM USAGE : This plugin takes 2 arguments
REM first argument is the OID (1.3.6.1.4.1.30764.1.2.2.1.1.3.n)
REM second argument is the name of the Qlikview Task
REM %script% %OID% %TaskName%
REM Query OID 1.3.6.1.4.1.30764.1.2.2.1.1.2.n to get the Task Name
REM Replace n with a number >0 to get individual Task Names/Task Status
REM Query OID 1.3.6.1.4.1.30764.1.2.2.1.1.3.n to get the Task Status
REM Last...Refer to the Qlikview Server Reference Manual for more details
REM Run snmpget to get the job status
for /f "tokens=4" %%i in ('snmpget -v 1 -c public 127.0.0.1:4721 %1') do set qvstat=%%i
REM If nothing is returned, go to unknown
if "%qvstat%" == "" GOTO unknown
REM Do some parsing here to remove the quotes
set qvstat=%qvstat:~1,-1%
REM If the status matches, do something
if not x%qvstat:Waiting=%==x%qvstat% GOTO ok
if not x%qvstat:Running=%==x%qvstat% GOTO ok
if not x%qvstat:Aborting=%==x%qvstat% GOTO ok
if not x%qvstat:Finished=%==x%qvstat% GOTO ok
if not x%qvstat:Warning=%==x%qvstat% GOTO warn
if not x%qvstat:Failed=%==x%qvstat% GOTO err
REM If nothing matches, run unknown and exit
:unknown
REM Adjust to warning if you like (exit should be 3)
echo CRITICAL: Status of Job %2 is unknown
exit /B 2
:err
echo CRITICAL: Job %2 failed
exit /B 2
:ok
echo OK: Job %2 status is %qvstat%
exit /B 0
:warn
echo warning: Job %2 state is warning
exit /B 1
REM END
You will first need to install net-snmp and then a Nagios client like NSClient++. After that, you need to follow the Qlikview instructions on how to enable SNMP.
Once SNMP is enabled, run this script and pass the OID and the Task name. The results should look like:
OK: Job myjob status is Waiting
Refer to the Qlikview Docs for more information. Lastly, Enjoy!
@echo off
@setlocal enableextensions enabledelayedexpansion
REM Author : Gou
REM Date : 08/01/2013
REM Requires: net-snmp,NSClient++ and qlikview SNMP enabled
REM http://www.net-snmp.org/
REM
REM USAGE : This plugin takes 2 arguments
REM first argument is the OID (1.3.6.1.4.1.30764.1.2.2.1.1.3.n)
REM second argument is the name of the Qlikview Task
REM %script% %OID% %TaskName%
REM Query OID 1.3.6.1.4.1.30764.1.2.2.1.1.2.n to get the Task Name
REM Replace n with a number >0 to get individual Task Names/Task Status
REM Query OID 1.3.6.1.4.1.30764.1.2.2.1.1.3.n to get the Task Status
REM Last...Refer to the Qlikview Server Reference Manual for more details
REM Run snmpget to get the job status
for /f "tokens=4" %%i in ('snmpget -v 1 -c public 127.0.0.1:4721 %1') do set qvstat=%%i
REM If nothing is returned, go to unknown
if "%qvstat%" == "" GOTO unknown
REM Do some parsing here to remove the quotes
set qvstat=%qvstat:~1,-1%
REM If the status matches, do something
if not x%qvstat:Waiting=%==x%qvstat% GOTO ok
if not x%qvstat:Running=%==x%qvstat% GOTO ok
if not x%qvstat:Aborting=%==x%qvstat% GOTO ok
if not x%qvstat:Finished=%==x%qvstat% GOTO ok
if not x%qvstat:Warning=%==x%qvstat% GOTO warn
if not x%qvstat:Failed=%==x%qvstat% GOTO err
REM If nothing matches, run unknown and exit
:unknown
REM Adjust to warning if you like (exit should be 3)
echo CRITICAL: Status of Job %2 is unknown
exit /B 2
:err
echo CRITICAL: Job %2 failed
exit /B 2
:ok
echo OK: Job %2 status is %qvstat%
exit /B 0
:warn
echo warning: Job %2 state is warning
exit /B 1
REM END
Wednesday, July 24, 2013
Qlikview SNMP
By default, Qlikview provides a simple way of alerting someone if a job fails. You can see this in the web console where it says to enter an e-mail address.
Although, this works, I prefer to use Nagios to check. However, to do that, I need some way of talking to Qlikview. Good thing they provide the job status via SNMP. However, the documentation only states that they have it. The documentation doesn't tell you much more than that.
So, after mucking around with it for a few days and not getting very far with snmpwalk, I logged a ticket with Tech Support. The original response from Tech Support was that they didn't support snmpwalk.
I called them and they claim to have never heard of SNMP. So, I guided Tech Support to the documentation and Tech Support was surprised.
I gave Tech Support a crash course on SNMP and asked her to go check with someone and get back to me.
About 3 to 4 days later, I got a response. I think this may have come from a second tier support. He said he used a GUI snmp tool and it worked. He pointed me to (http://www.manageengine.com/products/mibbrowser-free-tool/). Ok, fine, it works but now I'm still stuck as I'm having trouble with snmpwalk and snmpget. By the way, he said the MIB file provided was DOA and he sent a working copy (so don't be surprised if you try to load the MIB file you have and it fails)
AND IT WORKS!!
Finally, I figured it out.
First, install the net-snmp tools and then install cygwin (or use the dos prompt if you want; for some reason, the dos prompt doesn't require "-L o" but cygwin does)
net-snmp tools
http://www.net-snmp.org/download.html
cygwin
http://www.cygwin.com
From the cygwin window, type:
snmpwalk -v 1 -c public -L o $hostname:4721 1.3.6.1.4.1.30764.1.2.2.1.1.3
(this returns something but it hangs; not sure why)
snmpget -v 1 -c public -L o $hostname:4721 1.3.6.1.4.1.30764.1.2.2.1.1.3.0
Just continue adding the last digit of the OID to get the next job status. I.e.1.3.6.1.4.1.30764.1.2.2.1.1.3.1
1.3.6.1.4.1.30764.1.2.2.1.1.3.2
1.3.6.1.4.1.30764.1.2.2.1.1.3.3
and so on....
The only caveat is that by default Qlikview decided that SNMP should be off by default so you need to enable it. Example:
enable SNMP for ALL Qlik services in %Program Files%\QlikView\% subdirectories%.
Files to edit are
1.QVManagementService.exe. config
2.QVDistributionService.exe. config
3.QVDirectoryServiceConnector. exe.config
Although, this works, I prefer to use Nagios to check. However, to do that, I need some way of talking to Qlikview. Good thing they provide the job status via SNMP. However, the documentation only states that they have it. The documentation doesn't tell you much more than that.
So, after mucking around with it for a few days and not getting very far with snmpwalk, I logged a ticket with Tech Support. The original response from Tech Support was that they didn't support snmpwalk.
I called them and they claim to have never heard of SNMP. So, I guided Tech Support to the documentation and Tech Support was surprised.
I gave Tech Support a crash course on SNMP and asked her to go check with someone and get back to me.
About 3 to 4 days later, I got a response. I think this may have come from a second tier support. He said he used a GUI snmp tool and it worked. He pointed me to (http://www.manageengine.com/products/mibbrowser-free-tool/). Ok, fine, it works but now I'm still stuck as I'm having trouble with snmpwalk and snmpget. By the way, he said the MIB file provided was DOA and he sent a working copy (so don't be surprised if you try to load the MIB file you have and it fails)
AND IT WORKS!!
Finally, I figured it out.
First, install the net-snmp tools and then install cygwin (or use the dos prompt if you want; for some reason, the dos prompt doesn't require "-L o" but cygwin does)
net-snmp tools
http://www.net-snmp.org/download.html
cygwin
http://www.cygwin.com
From the cygwin window, type:
snmpwalk -v 1 -c public -L o $hostname:4721 1.3.6.1.4.1.30764.1.2.2.1.1.3
(this returns something but it hangs; not sure why)
snmpget -v 1 -c public -L o $hostname:4721 1.3.6.1.4.1.30764.1.2.2.1.1.3.0
Just continue adding the last digit of the OID to get the next job status. I.e.1.3.6.1.4.1.30764.1.2.2.1.1.3.1
1.3.6.1.4.1.30764.1.2.2.1.1.3.2
1.3.6.1.4.1.30764.1.2.2.1.1.3.3
and so on....
The only caveat is that by default Qlikview decided that SNMP should be off by default so you need to enable it. Example:
enable SNMP for ALL Qlik services in %Program Files%\QlikView\%
Files to edit are
1.QVManagementService.exe.
2.QVDistributionService.exe.
3.QVDirectoryServiceConnector.
Tuesday, June 25, 2013
OpenLDAP pwdPolicySubentry and Replication
Over the weekend I decided to create a new policy for system users. The new policy would not enforce password expiration for these special system users.
Everything worked great except the internal aka operational attributes did not replicate over to the consumer.
After reading the man pages, I found that there was an entry in the slapd.conf file. The entry was "attrs=*". This omitted the operational attributes. To correct this, I simply deleted this entry. According to the man pages, the default is "attrs=*,+" which would replicate everything including operational attributes. But wait.... after restarting, it still didn't work. I had to go and modify the affected accounts. The modification must have triggered something and so the modified attribute and all the operational attributes now came over to the consumer.
This is OpenLDAP 2.3.x running on RHEL 5.x using syncrepl.
Everything worked great except the internal aka operational attributes did not replicate over to the consumer.
After reading the man pages, I found that there was an entry in the slapd.conf file. The entry was "attrs=*". This omitted the operational attributes. To correct this, I simply deleted this entry. According to the man pages, the default is "attrs=*,+" which would replicate everything including operational attributes. But wait.... after restarting, it still didn't work. I had to go and modify the affected accounts. The modification must have triggered something and so the modified attribute and all the operational attributes now came over to the consumer.
This is OpenLDAP 2.3.x running on RHEL 5.x using syncrepl.
Friday, April 26, 2013
rmdir: Device or resource busy
I was reconfiguring autofs and when I tried to rmdir one of the directories, I got "Device or resource busy"... ok, so let's just run 'fuser -v /dir'. To my surprise, it returned nothing. Alright, back to good old, lsof and..... nothing.
Strange. Looks like a bug.
So, I had to un-reconfigure autofs and put everything back the way it was. Once I did that, I restarted autofs and fuser -v correctly identified the rogue dir as being held open by autofs.
Now, all I did was, shutdown autofs, removed the rogue dir and reconfigure autofs once again.
Strange. Looks like a bug.
So, I had to un-reconfigure autofs and put everything back the way it was. Once I did that, I restarted autofs and fuser -v correctly identified the rogue dir as being held open by autofs.
Now, all I did was, shutdown autofs, removed the rogue dir and reconfigure autofs once again.
Sunday, April 21, 2013
taboot except Exception as e
taboot
Traceback (most recent call last):
File "/usr/bin/taboot", line 19, in ?
import taboot.cli
File "/usr/lib/python2.4/site-packages/taboot/cli.py", line 24, in ?
import taboot.runner
File "/usr/lib/python2.4/site-packages/taboot/runner.py", line 19, in ?
from taboot.util import instantiator
File "/usr/lib/python2.4/site-packages/taboot/util.py", line 25, in ?
from taboot.log import *
File "/usr/lib/python2.4/site-packages/taboot/log.py", line 114
except Exception as e:
^
SyntaxError: invalid syntax
I'm on RHEL 5 and I finally have taboot executing after a lot of effort troubleshooting. It's a terrible show stopper bug. I yum installed it from the EPEL repo and you would think that since you yum installed the command, it would just work right? Well, not with taboot.
The problem was, the syntax in taboot was coded for python v >2.4 but it wants v2.4. So, I had to go replace all of the "as e" errors with ",e".
Apparently, this "as e" vs. ",e" has to do with the version of python.
Subscribe to:
Posts (Atom)