How to get help if you have problems with mrtg
\s-1MRTG\s0 seems to raise a lot of questions. There are a number of resources apart from the documentation where you can find help for mrtg.
In the following sections you'll find some additonal Frequently Asked Questions, with Answers. Nobody has contributed a @#$%.pmd file yet. Go into the mrtg-2.17.4/translate directory and create your own translation file. When you are happy with it send it to me for inclusion with the next mrtg release.
Probably this has already been done. Check the stuff in the mrtg-2.17.4/contrib directory. There is a file called 00INDEX in that directory which tells what you can find in there.
There are many resources on the net that explain \s-1SNMP\s0. Take a look at this article from the Linux Journal by David Guerrero
http://www.david-guerrero.com/papers/snmp/
And at this rather long document from \s-1CISCO\s0.
http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/snmp.htm
Remove the *-{week,day,month,year}.png files and start \s-1MRTG\s0 again. Using \s-1MRTG\s0 for the first time, you might have to do this twice. This will also help when you introduce new routers into the cfg file.
Ask the person in charge of your Router or try 'public', as this is the default Community Name.
Well, the short answer is that when an \s-1SNMP\s0 query goes out and a response doesn't come back, \s-1MRTG\s0 has to assume something to put in the graph, and by default it assumes that the last answer we got back is probably closer to the truth than zero. This assumption is not perfect (as you have noticed). It's a trade-off that happens to fail during a total outage.
If this is an unacceptable trade-off, use the unknaszero option.
You may want to know what you're trading off, so in the spirit of trade-offs, here's the long answer:
The problem is that \s-1MRTG\s0 doesn't know *why* the data didn't come back, all it knows is that it didn't come back. It has to do something, and it assumes it's a stray lost packet rather than an outage.
Why don't we always assume the circuit is down and use zero, which will (we think) be more nearly right? Well, it turns out that you may be taking advantage of \s-1MRTG\s0's \*(L"assume last\*(R" behaviour without being aware of it.
\s-1MRTG\s0 uses \s-1SNMP\s0 (Simple Network Management Protocol) to collect data, and \s-1SNMP\s0 uses \s-1UDP\s0 (User Datagram Protocol) to ship packets around. \s-1UDP\s0 is connectionless (not guaranteed) unlike \s-1TCP\s0 where packets are tracked and acknowledged and, if needed, retransmitted. \s-1UDP\s0 just throws packets at the network and hopes they arrive. Sometimes they don't.
One likely cause of lost \s-1SNMP\s0 data is congestion; another is busy routers. Other possibilities include transient telecommunications problems, router buffer overflows (which may or may not be congestion-related), \*(L"dirty lines\*(R" (links with high error rates), and acts of God. These things happen all the time; we just don't notice because many interactive services are TCP-based and the lost packets get retransmitted automatically.
In the above cases where some \s-1SNMP\s0 packets are lost but traffic is flowing, assuming zero is the wrong thing to do - you end up with a graph that looks like it's missing teeth whenever the link fills up. \s-1MRTG\s0 interpolates the lost data to produce a smoother graph which is more accurate in cases of intermittent packet loss. But with V2.8.4 and above, you can use the \*(L"unknaszero\*(R" option to produce whichever graph is best under the conditions typical for your network.
Tobias Oetiker <[email protected]>