.\" Man Page for BFILTER .\" groff -man -Tascii bfilter.8 .TH BFILTER 8 "March 2006" .SH NAME BFilter \- An ad-filtering web proxy using heuristic ad-detection algorithms .SH SYNOPSIS .sp .B bfilter .BI "[-c " directory ] .BI "[-r " directory ] .BI "[-u " user ] .BI "[-g " group ] .B [-n] .BI "[-p " file ] .B [-k] .B [-h] .B [-v] .SH "DESCRIPTION" .PP .B BFilter is a web proxy that uses effective heuristic ad-detection algorithms to remove banner adverts, popups and webbugs from web pages. The traditional blocklist based approach is also implemented, but it is mostly used for dealing with false positives. Unlike other tools that require constant updates of their blocklists, bfilter manages to remove over 90% of adverts even with an empty blocklist! .P All processing is done on the fly, it doesn't load the whole page or image before processing. It uses heuristic and regex-based approaches to detect adverts and webbugs. It also uses a Javascript engine to combat Javascript generated adverts and popups. .P The web proxy supports the following features; .PP .B o HTTP/0.9 - HTTP/1.1 support .br .B o Persistent connections (HTTP/1.1 only) .br .B o Pipelining (HTTP/1.1 only) .br .B o HTTP compression .br .B o Forwarding to another proxy .P However, it does .B not support CONNECT requests typically used for HTTPS. .SH OPTIONS .TP .BI "-c, --confdir " directory Set custom config directory .TP .BI "-r, --chroot " directory Set chroot directory. This must contain the config directory but if config directory is not specified then chroot directory is used as config directory. .TP .BI "-u, --user " user Set unprivileged user .TP .BI "-g, --group " group Set unprivileged group .TP .B -n, --nodaemon Disable background daemon mode .TP .BI "-p, --pid " file Write process ID to a file .TP .B -k --kill Kill the running process specified with -p .TP .B -h, --help Show help .TP .B -v, --version Print version .SH FILES The default configuration settings for bfilter are in files located underneath the .B /etc/bfilter (and optionally .B ~/.bfilter for the user GUI configuration) directories. .PP For the base configuration the .B config and .B config.default files are used. For the URL pattern matching the .B urls and .B urls.local files are used. For the content filtering the .B filters/ directory may contain files specifying groups of filters and whether they are enabled. .SH PROXY CONFIGURATION .LP There are two configuration files, .B config.default which is shipped with bfilter and is overwritten when upgrading and .B config which has a higher priority so it can override rules specified in the config.default file. The following parameters can be defined in these files. .PP .I listen_address = host:port .br The address and port to which to bind the proxy. If host is unspecified it will bind to all interfaces. Multiple address seperated with a comman may be specified. .PP .I client_compression = yes | no .br If set to yes, all the textual data with "Content-Type: text/*" will be compressed before sending it to the client. This option can be useful if you are on a slow connection and you set up bfilter somewhere on a fast connection. In other cases, setting this option to yes will just introduce additional latency to the loading process. .PP .I ad_border = rrggbb | none .br The default behaviour is to draw borders around removed adverts. You may want to change the border color or turn the borders off. .PP .I try_icon_animation = yes | no .br Enable or disable the tray icon animation which indicate traffic is passing through bfilter (GUI only). .PP .I max_script_fetch_size = size_in_kilobytes .br Limits the size of external scripts that bfilter fetches for processing. Browsing with bfilter should feel as fast or faster than without bfilter. The only thing that can make it feel slower is the necessity to fetch external scripts to analyze them. A browser can usually cache external scripts but bfilter would download them each time for analysis. If you have a caching proxy server between bfilter and the internet, then it will cache scripts for bfilter otherwise you may want to adjust this parameter. .PP .I max_script_eval_size = size_in_kilobytes .br Protection against compressed scripts decompressing to very large sizes. .PP .I max_script_nest_level = number .br Limits the number of nested scripts that bfilter fetches for processing (similar reasoning to max_script_fetch_size). A smaller value like 3 will make bfilter faster, while a bigger value like 9 will make it detect more ads. (However the author has never seen an ad that is generated at levels more than 6.) Setting this value to 0 will disable script processing. .PP .I save_traffic_threshold = size_in_kilobytes .br Sometimes bfilter needs to download an image or a flash file to determine if it's an advert or not. Since bfilter tries to do everything on the fly, it usually knows the answer before the whole file is downloaded. At that time it checks how much data is left to be downloaded and if it's more than the value of this parameter (or if the size is unknown), bfilter will drop the connection to the server in order to save some traffic. The default value of 15 is good for most people, but if you use a dialup or a GPRS connection you may want to lower it to maybe 8 and if you use a satellite connection you may want to raise it to maybe 40. .PP .I report_client_ip = yes | no | fixed_ip .br Enable reporting the client IP to servers using the X-Forwarded-For header. .PP .I use_proxy = yes | no .br .I proxy_host = host .br .I proxy_port = port .br When use_proxy is set to yes, you may specify a proxy for bfilter to forward requests onto. .PP .I no_proxy_for = host, host, host .br When use_proxy is set to yes, you may specify some hosts to be contacted directly. The separator may be either a comma or a semicolon. If a host starts or ends with a dot it is assumed that any prefix or suffix can be appended to it, so for example "no_proxy_for = .mydomain.com, 192.168."). Note however that .mydomain.com won't cover mydomain.com itself but only its subdomains. (When matching no_proxy_for hosts, no DNS queries are being made. That means 127.0.0.1 won't act as localhost or the other way around.) .SH URL PATTERNS .LP BFilter allows you to block an arbitrary URL (web address) and to assign hints to URL's in order to influence the heuristic analyzer. To do so you assign a tag to a URL allowing both blocking and hinting (and more). .PP There are two configuration files, .B urls which is shipped with bfilter and is overwritten when upgrading and .B urls.local which has a higher priority so it can override rules specified in the urls file. .PP These files specify a number of rules. Each rule has the following syntax; .IP .B TAG url_pattern .PP Where TAG can be one of the following; .IP .B FORBID Output an error page. .br .B HTML Output a blank page. .br .B IMAGE Output a transparent image. .br .B FLASH Output a blank flash file. .br .B JS Output an empty JavaScript file. .br .B ALLOW Cancel any of the above tags. .br .B NOFILTER Don't filter a page or a script. .br .B +++ Be more suspicious about the URL (any number of plus signs). .br .B --- Be less suspicious about the URL (any number of minus signs). .PP The last two tags are special. They provide a hint to the heuristic analyzer and are only considered when we already have an ad suspect. For example, if we have a clickable image on a page we are going to consider hints for; .IP .B o The image URL. .br .B o The link URL. .br .B o The page URL. .PP Sometimes an advert can't be blocked with hints which can happen if bfilter doesn't see it (probably because of a problem interpreting a script) or doesn't support that kind of advert (text or hover adverts). In that case you may still block it using other tags. Note that hints don't intersect with other tags, when we are looking for a hint we don't consider other tags (and vice versa). .PP BFilter supports two types of patterns; .IP .B o Simple strings with wildcards. .br .B o Regular expressions. .PP The simple string wildcards are ? and * meaning respectively "any character" and "any number of any characters". For example; .IP FORBID http://ads.somehost.com/* .PP This will block any URL starting with "http://ads.somehost.com/". Note that for broad ad-blocking patterns like this, it is recommended to use IMAGE rather than FORBID. This sounds wrong as we don't exactly know the type of the object we are going to replace with an image, but it turns out that IMAGE produces better results than any other tag. Any other tag results in broken images and FORBID will additionally cause error pages in place of IFRAME ads. Browsers accept an image where html was expected just fine and are even smart enough not to interpret an image where a script was expected. .PP Regular expression patterns must be enclosed within two slashes. For example; .IP JS /http://(www\.)?somehost\.com/ads/.*\.js/ .PP This regex can be interpreted like this: match "http://", optionally match "www.", match "somehost.com/ads/", match any number of any characters or match ".js". .PP As a quick summary, in regular expressions; .IP .B . means any character .br .B \e. means the "." character .br .B \e? means the "?" character .br .B .* means any number of any characters including none .br .B (this|that) means "this" or "that" .br .B (something)? means "something" or nothing .PP You may find a tutorial and a complete reference on regular expressions at http://www.regular-expressions.info. .PP Note that both simple and regex patterns are case insensitive. .SH CONTENT FILTERS BFilter allows you to apply regular expressions to page content. This can be used for things like removing portions of a page, altering scripts or injecting your own scripts. There are a couple of things that make bfilter's implementation of this feature unique; .IP .B o Applying a regex doesn't cause buffering of the whole page. .br .B o Replacement expressions can contain JavaScript code. .PP Content filter configuration is not currently covered in this man page. Please view the bfilter web page at http://bfilter.sourceforge.net/doc/content-filters.php for further information. .SH EXAMPLES All images from known advert domains are replaced with a transparent GIF or empty flash. .IP IMAGE /http://(.*\.)?(doubleclick|fastclick|tradedoubler)\..*/ .br FLASH /http://(.*\.)?(doubleclick|fastclick|tradedoubler)\..*/ .PP Prevent hover adverts (DHTML pop-ups) from known advert domain. .IP FORBID /http://([^/]+\.)?layer-ads\.de/.*/ .PP Prevent tooltip adverts from known advert domain. .IP JS http://kona.kontera.com/javascript/* .br FORBID /http://[^/]+\.intellitxt\.com/intellitxt/.*/ .PP Allow images used to count page views for projects hosted on SourceForge. .IP ALLOW /(www\\.)?sourceforge.net/sflogo.php\\?.*/ .PP Apply hints to suspicious URL's. .IP ++++++ /http://ads[\d]*\..*/ .br +++++ /.*/(ad[sv]?|advert|banners?)[^a-z].*/ .br ++++ *banners* .br +++ *banner* .br +++ *click* .SH NOTES If the HTML processor is in doubt about an image or a Flash file, it defers the decision until the browser has requested that file. The response is then analyzed (redirects, cookies) as well as the file itself. For an image, the analyzer checks its dimensions and whether it's animated or not. For Flash files, the analyzer is trying to find a button that covers most of the object's area and has a getURL action associated with it. Depending on the results, the object is either forwarded to the client, or substituted with a generated replacement. (Unfortunately, analyzing objects that are placed with Javascript doesn't work, as their URLs in javascript source cannot be altered.) .SH BUGS Please report any bugs you may find to: .P .B http://sourceforge.net/projects/bfilter .SH AUTHOR Joseph Artsimovich .br http://bfilter.sourceforge.net .SH SEE ALSO regex(7)