DansGuardian: Performance Tuning

From OnnoWiki
Jump to navigation Jump to search

sumber: http://contentfilter.futuragts.com/wiki/doku.php?id=performance_tuning


Performance Tuning

   The hurrier I go, the behinder I get.

The performance of a DansGuardian system can be so good the filtering isn't even detectable by users. There's no good reason to put up with much less. Unfortunately the performance of DansGuardian/Squid when first installed may not be anywhere near optimal, so some tuning will often be in order. This document attempts to answer the question: When “tuning” is required, where should one look?

It's hard to say which of the factors below is most important. If possible attend to them all. But if you can do only one thing, tune for minimal swapping.

(Any local proxy – including things like Tinyproxy and Oops! – can be used as the backend for DansGuardian. But saying so every time would make this document even longer. So in many places this document uses the shorthand term “Squid” to mean simply “whatever backend proxy is being used with DansGuardian”.) How Much?

Hardware capabilities change so fast that carefully vetted guidelines would become obsolete in a year and downright misleading in two years. The performance requirements of a DansGuardian/Squid system vary all over the map depending on a whole lot of different factors. It's common to find two systems that at first glance appear to be the same, yet are tuned very differently; further examination generally turns up a difference that what was thought to not matter. For example just the addition of one thing as supposedly simple as anti-virus scanning can sometimes dramatically change the performance of the DansGuardian/Squid system.

Factors that can dramatically affect performance requirements include:

   frequency of downloading of very large files
   frequency of use of streaming media
   custom filtering (especially the amount of phrase filtering)
   anti-virus scanning
   use of external anti-phishing services
   clumps of users visiting the same websites
   large groups of users using the web at exactly the same time
   local DNS architecture

Outline of Problem Solving Strategy Step # 1 of 3

First set up your gateway so an end user computer can access the Internet directly without using DansGuardian/Squid at all. Doing this both

   establishes your baseline as performance can't be any better than this no matter what and
   exposes any possible performance problems that have little to do with DansGuardian/Squid.

Step # 2 of 3

Then second start Squid and reconfigure either your browser or your IPtables so web traffic flows through Squid. Tune this configuration to provide performance that's almost as good as direct access. You may need to tune the size of the Squid cache and/or the number of cache directories and subdirectories to match your expected usage pattern. If performance is poor with just Squid interposed in the traffic path like this, the most likely problems are:

   inadequate DNS service
   inadequate amount of RAM
   needless overly complex configuration, including improper use of regular expressions

(Normally all configured restrictions should be in DansGuardian, rather than splitting configuration between DansGuardian and Squid [even though split configuration is technically possible]. In normal use with DansGuardian, there will be only a few uses of url_regex… and no other uses of regular expressions at all.) Step # 3 of 3

Only then, third add DansGuardian. Again reconfigure either your browser or your IPtables so traffic now flows through DansGuardian as well as Squid.

The additional memory usage may throw off the system balance you thought you'd already tuned. So you may need to go back and change to your Squid configuration again. General Performance Tuning

   CPU
   Squid is largely I/O bound, and while DansGuardian uses a non-trivial amount of CPU it's hardly ever CPU bound when reasonably configured. So for a DansGuardian/Squid server it's quite sensible to use hardware that has considerably less than the fastest available processor.
   RAM
   Definitely add as much RAM as possible to a DansGuardian/Squid system. Usually RAM provides the best bang for the buck. DansGuardian/Squid systems with much much more than 2GB of RAM aren't uncommon. (Don't worry that this amount of memory seems more typical for some other OSs like Windows Vista rather than for Linux.)
    Squid's RAM↔DISK balance is controlled by its maximum_object_size_in_memory parameter. (The memory_pools_limit parameter may also need to be tweaked.) If you have a lot of RAM, raise the value higher to make more use of the RAM. Note that changing this setting may require you to tune for minimal swapping all over again.
    (Squid's use of shared memory is so sophisticated that it confuses most known performance monitoring tools; right after boot X amount of memory will supposedly be free, but then after some load comes and goes, the same tool will now say that Y amount of memory is free even though the conditions haven't changed. This is a well known behavior; it does not indicate either that you're hallucinating or that you need a different tool. What it means is that memory readings right after booting or restarting the filter are not very useful. To make a memory measurement, let at least ten minutes elapse after the filter starts, and arrange for some non-trivial system load.)
   I/O
   Squid is usually I/O bound. Improvements to the I/O subsystem will generally pay off.
   Network
   Connectivity from each end user computer over your network to your DansGuardian/Squid server should be very good. Substandard network wiring or equipment can cause the end user computers connected through it to behave badly.
   Server Network Connections
   The network connections from the DansGuardian/Squid system to both your internal network and to the Internet should be very good.
    After the system has been running a while, run ifconfig (may only be on the superuser's PATH) and look at both the TX and RX packet count lines. On each and every network interface, “errors” as a percentage of “packets” should at minimum be less than 0.01% (ideally “errors” should be zero). If your server has several NICs “bonded” together, ensure all of the NICS work well.
   Interactive Activity
   Network servers should not also routinely host interactive activity. Especially, the GUI (probably X11 and more) should not be running (not even if no one is logged on); a GUI can easily suck up as many resources as all the other processes on the computer put together. So logins should be the old text command-line style. On most distributions this means “runlevel” should be 3 rather than 5. To have this happen automatically whenever the system is rebooted, you may need to edit /etc/inittab so the startup line says id:3:initdefault:.
   Runaway Processes
   Beware unnecessary and/or runaway processes. Stop everything you don't need from running (Samba? Apache? OpenLDAP? CUPS? etc? etc?) With the system under load (and after making sure the system is in runlevel 3), run top for a couple minutes, and note which applications sort to the top of the list. If they are applications you don't need, both stop them and change the system configuration so they won't start again when the system is rebooted (likely your distribution's tool for controlling which applications start at boot time is either chkconfig or update-rc.d). If the top few lines almost always show the same application, which as far as you know should be idle, figure out why it's sucking up so many resources, then fix it.
   Kernel
   In a few cases with extremely large systems (typically over a thousand simultaneous users), you will hit a kernel limit with either the number of file descriptors or the size of a directory. If this happens you may need to reconfigure your kernel or even rebuild it. But these cases are quite rare. In the vast majority of cases no kernel configuration parameters are useful in tuning DansGuardian/Squid performance.
    (Especially with BSD-derived kernels, kernel configuration parameter tuning may be necessary for a different reason than performance: to provide stable DansGuardian/Squid operation. For details see Operation Under NetBSD/FreeBSD/OpenBSD.)

Performance Issue: Swapping

If overall system usage rises to the point where swapping starts to occur, in some cases performance will “fall off a cliff”. The conventional wisdom says virtual memory is a very good thing, unimportant memory pages are swapped out first, and swapping is just a normal part of the operation of a modern system. This is almost always true but not for a DansGuardian/Squid system if the system swap area and the Squid cache storage are on the same disk. The disk head contention will be so severe at the worst possible time that overall performance will be that of a very badly overloaded system (even though the CPU usage is low).

Often Squid will at first seem to run fine, but when DansGuardian is also added and the system is loaded, overall performance will suffer dramatically. What has happened is the additional RAM usage of DansGuardian has caused swapping to become routine, thus affecting Squid. Despite appearances, the problem is not with DansGuardian itself. How To Tell

Run vmstat 5 for a couple minutes when the system is under load. The si (swap in) and so (swap out) columns should be very low. If there are non-zero numbers in these columns, swapping is occurring. What To Do

Double-check that no unnecessary applications are running, double-check that no GUI (X11 etc.) is running, and add RAM.

Then tune both Squid and DansGuardian configurations to reduce memory usage so that swapping seldom occurs (alternatively at least put the system swap space and the Squid data cache on different drives). Remember to check swapping only after the system has experienced a non-trivial load and has settled; checking for either the amount of free memory or for evidence of swapping immediately after restarting DansGuardian/Squid will be misleading. There are a great many parameters in both Squid configuration and DansGuardian configuration that affect the amount of memory that's used.

(Admittedly this is rather vague; attempt the general goal of minimizing swapping anyway. This document can't be more specific because there are so many many parameters that have at least some effect on memory usage, and every installation is different.) Performance Issue: DNS

In a DansGuardian/Squid system, the website name lookups done by the individual browsers are no longer the only significant DNS activity on the network. The DansGuardian/Squid server does at least one additional name lookup for nearly every web request. Your Internet connection drop can become so saturated with DNS traffic that it sometimes isn't available for web pages.

As a result both of DansGuardian/Squid's need for DNS service and of the congestion of the Internet connection drop, web browsing performance can be very significantly degraded if DNS service to the DansGuardian/Squid server itself is anything less than excellent or the Internet connection drop doesn't have sufficient capacity.

With DansGuardian/Squid, good name service to the end user computers is not enough; the DansGuardian/Squid server itself now requires excellent name service too. How To Tell

Execute time dig www.atomictimeclock.com - the “real” time should be less than a third of a second. Execute time dig www.atomictimeclock.com a second time - this time the answer should come out of some cache so the “real” time should be less than a tenth of a second (sometimes a whole lot less). What To Do

Consider each of these items, and where necessary rebalance the item against your other needs. You may for example decide that having host names in the DansGuardian log is worth the performance penalty. On the other hand you may decide that performance is more important. Don't necessarily do all these things, but think about each one.

   Check the local /etc/hosts file to be sure there are no obsolete or erroneous entries in it. Every single name in the file should answer immediately to a ping test (unless you purposely use bogus entries expecting them to cause significant delays).
   Check the configuration of the DNS system on the DansGuardian/Squid server. Every “nameserver” listed in /etc/resolv.conf should be a current name server that's immediately accessible. (Be careful though about modifying /etc/resolv.conf. If the computer uses DHCP to obtain an IP address, the contents of /etc/resolv.conf may be recreated every time DHCP is activated, so any changes you make will be overwritten. Fortunately if the contents of /etc/resolv.conf were created by DHCP, they are almost certainly correct and no changes will be required.)
   Check that in squid.conf log_fqdn still has its default value of off. Turning it on may make the Squid logs easier to analyze, but at a significant performance cost. This reverse name translation happens in real time as sites are accessed, rather than as a batch operation later when the log is being analyzed.
   Check that in dansguardian.conf logclienthostnames still has its default value of off. Turning it on may make the DansGuardian logs easier to analyze, but at a significant performance cost. This reverse name translation happens in real time as sites are accessed, rather than as a batch operation later when the log is being analyzed.
   Check all the DNS-related entries in squid.conf. Most likely the dns_nameservers line will simply be omitted, which means use the nameservers specified in /etc/resolv.conf instead. If the dns_nameservers line is present and lists any inactive or unreachable DNS servers, correct it (probably by simply deleting it altogether) as this error can seriously degrade performance.
   Implement a “caching DNS” system for your whole internal network. This is the ideal solution, but it may not be reasonably feasible. If you do this be sure to i) change your own DHCP server so your internal systems get their name service from your new caching DNS, ii) reconfigure your servers's DHCP client so it doesn't overwrite /etc/resolv.conf, and iii) modify your /etc/resolv.conf to point at your new caching DNS.
    You may be able to implement a “caching DNS” system with the dnsmasq program rather than with BIND servers.
   Use “traffic shaping” to give priority to both DNS requests and DNS responses. Note well that for a network server (rather than the individual system many Traffic Shaping HowTos assume) DNS requests and responses should have higher priority than ACKnowledgements.
    The simplest way to set up traffic shaping may be with Shorewall. You shouldn't have to install anything special; only the SFQ and HTB queuing disciplines which are included in a standard Linux kernel will be used. Although the exact configuration of traffic shaping is outside the scope of this document, here's an example of the relevant lines from the Shorewall tcrules file:
   1       0.0.0.0/0       0.0.0.0/0       tcp             53
   1       0.0.0.0/0       0.0.0.0/0       udp             53
   (Note these rules depend on the destination port being 53, which will always be the case. They do not need to be changed to accommodate the source port being random as it is after the recent fix to DNS.)
   If it appears that your ISP's DNS servers are too slow, try using the OpenDNS servers (http://www.opendns.com) instead. (Your ISP's name servers usually have the significant advantage of being very few hops away, thus providing excellent name service even when they're not particularly fast. Nevertheless issues do sometimes arise, so at least consider the possibility the other end is too slow although things are just fine on your end.)

Performance Issue: I/O

The Squid part of DansGuardian/Squid is I/O intensive and will magnify any glitch in the server's I/O. What To Do: Segregating Disk Traffic

If possible put the Squid cache on a different disk than the rest of the system. Having the OS, the swap space, the webcache, and the logs all on the same disk causes lots of head seek contention which may be noticeable as jerky (or even absurdly slow) response. In fact disk head contention between swapping and caching may contribute significantly to the “performance falling off a cliff” syndrome often seen when swapping starts.

If the Squid cache is on its own partition, you can mount that partition with the noatime mount parameter (which eliminates the additional disk accesses needed to record file “access time”). This will speed up Squid performance, and is highly recommended wherever possible. (But don't do this if other applications share the same disk partition, as some applications [obviously not Squid] do not work correctly with this mount parameter.)

Even better is if the Squid cache disk is not only a separate partition on a separate disk but the disk is connected to a separate I/O channel (or one with nothing else but a CD drive that's not usually active). And even better than that is if the Squid cache partition is the only thing on a disk which is not otherwise used at all.

Consider too that Anti-Virus scanning is another function that generates an awful lot of disk file activity. All the Anti-Virus scanning file activity will be in one directory (typically /tmp). Again disk head contention can be a problem. Locate the Anti-Virus scanning directory carefully, maybe even on a separate partition on a separate disk on a separate I/O channel.

One possibility that might provide even higher performance is moving the webcache and the A-V temp space to high quality SSDs (Solid State Disks) or hybrid storage devices. Such devices appear to the I/O channel to be disks, but are actually partly or entirely memory devices. Be wary though of the bewildering array of SSD devices and capabilities, not all of which are helpful or even appropriate, particularly as they usually support only a limited number of change cycles. What To Do: Avoiding And Using RAID

Some implementations of some RAID options are very fast, and these can sometimes be helpful. Many implementations of many RAID options though emphasize reliability over speed, and should be avoided for webcache use. It often makes more sense to consider the webcache “expendable” and simply reinitialize it from scratch after a disk failure instead of putting it on any RAID configuration.

If you nevertheless choose to use RAID for your webcache, be very careful as RAID often extracts a performance penalty in compensation for increased reliability. Especially avoid software RAID for anything other than mirroring. Performance of even hardware RAID and even with identical RAID options can differ dramatically; don't just assume anything like “all hardware RAID striping is fast”. Performance Issue: Poor Regular Expressions

Normally Regular Expressions are used just a little bit by Squid and just a little bit by DansGuardian. Occasionally user configuration causes Regular Expressions to be used much more heavily. In the vast majority of cases use of Regular Expressions does not cause any problem (performance or otherwise) at all.

However certain misuses of Regular Expressions and certain mis-constructed Regular Expressions can significantly degrade performance. Misused Regular Expressions

In Squid, identification of particular hosts should usually be done with dstdomain. In most cases using a regular expression to identify a particular host not only isn't necessary but also can significantly degrade performance. So if you see regular expressions being used to identify particular hosts, investigate using dstdomain instead. Poorly Constructed Regular Expressions

If you see ^.* or .*$ in a regular expression, suspect that whole regular expression. Such constructs are never necessary, and prevent many internal optimizations. They will unnecessarily degrade performance a bit. More importantly, they suggest that the whole regular expression is of poor quality, so you should closely examine the whole thing. Exponential Match Regular Expressions

If you see *)*, *)+, +)*, or +)+ [not immediately preceeded by (?> ], you are looking at an “exponential match” regular expression. These are sometimes called never ending because they work so slowly users often give up on them before they terminate. Such constructs are never necessary; there's always a way to construct a better regular expression that does the same thing but without exacting the exponential performance penalty. Performance Issue: Child Processes

DansGuardian is architected as a large number of closely cooperating processes. The number of child processes is controlled by quite a few …children settings in dansguardian.conf. For best performance, the number of DansGuardian child processes should be roughly matched to your usage patterns.

The exact number of child processes is not usually critical; the number of child processes can be off by a factor of two (or even more) with little noticeable effect on performance. However extreme mismatches between your …children parameters and your usage pattern can have a noticeable performance impact. (If the number of child processes changes memory utilization so the amount of swapping changes significantly, this may in turn have a large effect on performance. The performance impact may seem to be related to the number of child processes, when actually the relation is indirect through the amount of memory used.)

If the number of child processes is too small, system response will be poor for a couple reasons. The first reason is that users that just became active may be forced to wait while a new subprocess is launched. The second reason is that each child process will be responsible for handling so many different users that users will begin to interfere with each other. If on the other hand the number of child processes is too large, processes will be laying around using up RAM but not doing much useful. When the RAM usage gets too large, the system will begin to swap, which often in turn will degrade performance. (Note well this means that increasing the number of child processes further beyond having “enough” will not improve performance; in fact just the opposite is true, it will reduce performance.)

The maxagechildren parameter seldom has any significant impact on performance. It's a “fail safe” mechanism that minimizes the effects of a runaway child process. It causes each child process to commit suicide after processing that number of requests. As runaway child processes are now extremely rare, the prudence of this approach may not be particularly relevant any more. In a few cases, if the parameter is set to a very small value (only a few hundred), it can cause so much “churn” of continually creating new child processes that DansGuardian performance is reduced. If you find this is what's happening, increase the parameter to a much larger value (thousands or even tens of thousands). What To Do

A good guideline is to simply turn the …children parameters up until the computer begins to swap heavily, then turn them back down until the computer stops swapping. (If swapping isn't an issue in your environment, turn the …children parameters up until maxchildren is approximately the same as your number of simultaneous users.)

Initially, increase or decrease all the …children parameters proportionally, maintaining roughly the same ratio between them as they had with their default values. (If you understand both the parameter names and the comments about them in dansguardian.conf, you can adjust each parameter separately, and you may get slightly better results. But the difference is not so large as to be a concern; it's usually reasonable to simply increase or decrease all the parameters proportionally in lockstep.)

(Some of the comments in dansguardian.conf that give guidelines for these parameters may be outdated. Treat these parameters pragmatically; if your own experience disagrees with the comments, give preference to your own experience.) Performance Issue: Many Many Simultaneous Users

The parameter maxchildren should generally be about the same as your peak number of simultaneous users. But attempts to increase this parameter beyond 1018 (the funny number is 1024 - 6) just result in a weird error message that says something about “rabbits”. This never used to matter much, because typical PC hardware couldn't handle more than about a thousand simultaneous users anyway. But some larger faster PC hardware could now greatly exceed this limit.

The limitation of maxchildren is built in (actually it's a restriction of the select system call and the Posix standard) and can't be changed easily. To increase the limitation you will have to use a custom procedure to rebuild DansGuardian. For more information and a possible procedure for rebuilding DansGuardian, see question Usage#11c in the Wiki FAQ.






Referensi