Home Matt McIrvin mmcirvin@world.std.com

A simple trick that makes Usenet more pleasant,

or: How to killfile massive crossposts

(and related commentary on Usenet technology and culture)

By Matt McIrvin

Contents

  1. Filtering
  2. Crossposted to hell and back
  3. Examples of simple filtering
  4. Regular expressions
  5. The trick
  6. But what about bandwidth?
  7. But what if I can't filter Usenet?
  8. But what about spam?

1. Filtering

When Usenet was invented in the late seventies, it was assumed that the volume of posts would be fairly low, and that any subscriber to a newsgroup would want to read everything in the newsgroup. Early news-reading software had no capability for deliberately ignoring things.

Since then, and especially in the last few years, the volume of posts on Usenet has greatly increased, and with it the volume of aggravating, hostile, and off-topic posts. (I use "Usenet" colloquially to mean net news in general, including things that aren't technically part of Usenet, such as the alt. hierarchy.) You can complain about this on Usenet, but it's unlikely to do any good, and the complaining is usually as tiresome as the annoying posts themselves. Alternatively, most newsreaders allow you to pick out the articles you want to read by their subject and author information, but that's still a time-consuming manual process. It's better to combine this with some sort of automatic filtering mechanism to screen out things that are going to annoy you before you ever see them, or even to pick out the wheat from the chaff and ignore everything else by default.

Historically, on Unix systems, automatic filtering actually appeared before "point-and-shoot" newsreaders; rn, which had no screen-based article selector, could do simple filtering.

Unfortunately, today it's considered an advanced capability for Net gurus, even though it's not really that hard to use. The first generation of graphical PC and Mac newsreaders designed to run over SLIP/PPP connections often didn't have automatic filtering at all. Also, many people now use World Wide Web browsers as newsreaders, and they usually have little or no filtering capability.

Therefore, many Usenet users either don't have newsreaders with this kind of capability, or don't know how to use it. It's worth closely studying your newsreader's documentation to see if this is available; the capability is often referred to as "killfiles," "scorefiles," "filters," or "memorized commands."

The Unix newsreaders nn, trn, and strn have filters of varying power. Some newsreaders for the Macintosh that provide flexible filtering are YA-NewsWatcher, MT-NewsWatcher, and MacSOUP. Commercial newsreaders for Windows that have some sort of filtering are Gravity and Forte Agent. Of all of these, I think that nn is the only one that can't do the trick I describe below, though I'm not sure about Agent.

Usually, filtering features allow you to instruct the newsreader to search lines in the article header, such as the From: or Subject: lines, for certain patterns. When a pattern is found, they can "junk" (that is, not display) matching articles, or automatically display them if the search criterion is for something you do want to look at. "Scorefiles," as implemented in strn and YA- and MT-NewsWatcher, give you finer control: multiple search criteria can be used to adjust a numerical score for each article, with only articles above a certain minimum score appearing on your screen.

It's usually not hard to figure out how to do simple things, such as junking all articles by a certain author or with a certain subject, using the documentation that comes with your newsreader. More complicated tricks, though, are not obvious even if you know all of the commands. This is because their efficacy depends not only on the mechanics of the newsreader, but on the culture of Usenet. In order to program a machine to seek out and junk nuisances, you have to know their identifying marks.

2. Crossposted to hell and back

One of the most common Usenet annoyances is really quite easy to spot: it is the article that is, to use the traditional phrase, "crossposted to hell and back." The groups to which an article gets posted are listed in a line in the article's header, which begins with the word "Newsgroups:". You can "crosspost" an article to multiple newsgroups by putting the names of all the groups in the Newsgroups: line, separated by commas. The article only gets sent once to each site, but readers of all of the groups in the list will see it. Followups to the article (that is, replies that are posted rather than sent as e-mail) will also appear in all of the groups, unless the Followup-To: line in the header says otherwise, or the responding poster edits the Newsgroups: line in the followup.

This holds great potential for mischief, intentional and unintentional. It's always a good idea to check and, if necessary, edit the Newsgroups: line when composing a followup. However, not everybody does this, and with some crudely written news software it isn't even possible. If a dozen newsgroups appear in the Newsgroups: line, the article will appear in all of them, and so will many of the replies.

If an inflammatory article gets posted to all of those groups, the resulting shouting match is likely to show up in many places. Readers who don't understand crossposting will complain about the article's appearance in "this group" in a response that gets crossposted everywhere, without even specifying what "this group" is. Readers will assume that all participants in the argument have read articles that they probably haven't, causing more friction. To make matters worse, sometimes the Followup-To: line in a "flamebait" post is maliciously designed to send followups to different, inappropriate groups, if the respondents don't edit their Newsgroups: lines. The resulting confusion can plunge several newsgroups into temporary chaos: a multi-group "flamewar."

Another annoying phenomenon is the "cascade," a thread in which every poster quotes the entire preceding message and adds one line. By now, Usenet veterans find these extremely tiresome, and if they appear in a single group, they don't last long. But if they're massively crossposted, there will always be new readers who have never seen one before and think they're cute, so the stupid thing propagates for ages.

(You might worry that I'm giving people instructions for social disruption here. Actually, you can see all of these phenomena in action just by reading Usenet for a week or two, and people fond of causing chaos find out about them rapidly. Anyway, if more people did what I describe below, it wouldn't even be an issue.)

There are other annoying articles that sometimes get massively crossposted. When advertisements are "spammed" to hundreds of newsgroups, they are sometimes crossposted to ten or twenty newsgroups at a time. Conspiracy theorists, pseudoscientists, and religious fanatics often crosspost their ramblings to dozens of newsgroups so as to give their IMPORTANT information the widest possible distribution.

As a general rule, articles crossposted to more than two or three newsgroups are almost never of interest to anybody. Information of sufficiently general interest to be on topic in many groups, such as a FAQ for a whole newsgroup hierarchy, is better off being posted to one or two groups of more general scope, or made available via FTP, gopher, or the World Wide Web; since, most likely, only a minority of the population in any given newsgroup will actually want to see it. FAQs of this sort are gradually moving to the Web.

Massive crossposting is, therefore, an identifying characteristic of a nuisance post. You can often set your newsreader to junk these posts automatically, by using its filtering capabilities.

(Indeed, if all they want to do is junk massive crossposts, users of the commercial Windows newsreader called Gravity can stop reading right now. There's an easy-to-use option that does it automatically. This is fortunate, since Gravity's filters can't otherwise access the necessary header lines.)

3. Examples of simple filtering

In what follows, I'll use trn killfile entries as examples, because a trn killfile is entirely textual (it's literally a text file called KILL in a directory corresponding to the newsgroup), so it's easy to describe the commands in a text document. (This isn't intended as comprehensive documentation for trn killfiles; for that, type "man trn" at the Unix command line on a system that has trn. If you want to use trn and are completely baffled by the following, you might want to read this introduction to trn.) Other newsreaders use different methods, but it's usually easy to figure out how to use them from the documentation.

It's usually easy to tell a newsreader to look for a particular pattern in some header line. For instance, in a trn killfile, the line

/McIrvin/f:j

will search the From: line for the word "McIrvin," and junk everything containing that name. It will therefore junk all articles that Matt McIrvin wrote.

If I were using trn and wanted to search the Newsgroups: line in the header to junk all articles crossposted to alt.my.head.hurts, I could put a line in the killfile that reads

/alt.my.head.hurts/Hnewsgroups:j

and that would do the trick. However, if there is a separate news server, it might slow things down, because, unlike the Subject: and From: lines, Newsgroups: is usually not fetched from the news server at article-selection time by default. (Indeed, nn can only filter on the Subject: and From: lines, which is why it can't do the trick described below. As a consolation, it can apply filters to those header lines at incredible speed.)

It's faster to search the Xref: line, if you can. (It is not always possible.) This is a header line which is locally generated at your site, that gives the locally available groups to which the article is posted and its local article number within each group.

(A brief note of technical bafflement, which you can safely ignore: On some news servers, this line is fetched along with the Subject: and From: lines with something called the XOVER command, and the Newsgroups: line usually is not. That's the explanation I hear most often for why Xref: is faster. On the other hand, I know for a fact that it's also faster to search Xref: on the news server that I use, which does not send Xref: via XOVER (it's faster even when I am explicitly not using XOVER), so this is still a mystery to me.)

Anyway, in trn,

/alt.my.head.hurts/Hxref:j

would junk all of the articles mentioning alt.my.head.hurts in their Xref: lines. But how do I do something more complicated? If I want to junk any article crossposted to, say, four or more newsgroups no matter what they are, that's not a simple string search.

4. Regular expressions

Many newsreaders support searching for "regular expressions." These are search patterns that are more flexible than simple search strings. They're a little like the "wild card" patterns available in DOS and Unix, but they're different, and provide finer control. You should consult your documentation to find out if you can use "regexps," as they are called, and to get more details (the command "man 5 regexp" will provide a full explanation on a Unix system, but note that some newsreaders only partially implement regular expressions--the regexps supported by trn seem to be the ones described in the ed manual, accessed via "man ed" on Unix).

The following is not a complete guide to regular expressions, but is more than enough to understand the rest of this essay, and to do many other things with killfiles that you can't do with simple string matching. (If your eyes glaze over, don't worry; I'll repeat the important ones later on.)

In a regular expression,

As far as I can tell, no two versions of regexps are quite the same. Most of the features above will often be supported, and there are usually others as well.

5. The trick (in several flavors)

The basic trick

Recall:

So ".*" would mean "any string of characters".

Given that, you can probably think of one way to junk massively crossposted articles. The names of newsgroups in the Newsgroups: line are separated by commas, so you could search for blocks of arbitrary characters separated by commas in the Newsgroups: line. For instance, in a trn killfile,

/,.*,.*,/Hnewsgroups:j

would junk all articles with three commas somewhere in the Newsgroups: line, that is, all articles crossposted to four or more newsgroups.

But, as I said, it is more efficient to search the Xref: line, if it is possible to do so. Each entry in the Xref: line is a newsgroup's name followed by a colon and a number (the local article number). Since each entry has a colon, you can just search for colons:

/:.*:.*:.*:.*:/Hxref:j

would junk all articles crossposted to four or more newsgroups. There are five colons rather than three. Why? One of the extra colons is there because the colons are contained within the entries, rather than between them. The other one is there because trn considers the Xref: line to include the string "Xref: ", which has a colon in it! Many other newsreaders look only at what follows the prefix of a header line, so you would have them look for the regular expression

:.*:.*:.*:

instead, with only four colons, if you wanted to junk articles crossposted to four or more groups.

However you filter massive crossposts, you will probably find that an extraordinary amount of garbage goes down the drain. In many newsgroups, a large fraction of the total traffic simply disappears, and it is usually a fraction that includes little or nothing of interest and much that is annoying.

Faster versions

I have since read, in Terje Bless's YA-NewsWatcher FAQ and Stefan Haller's MacSOUP manual, that to match this sort of regular expression more efficiently, it is better to specify that the strings of arbitrary characters can contain any character other than a comma or colon, as the case may be. That way, there are fewer ways that the regular expression could possibly match the string; the algorithm doesn't have to back up and try lots of different possibilities, so the matching takes fewer processor cycles. The difference could be very large if the number of crossposted groups you can tolerate is large.

It's not hard to do this with regular expressions, though the resulting expressions look a little more cryptic. Recall that

"[^,]" is "any character except a comma", and "[^:]" is "any character except a colon". So a faster version of the crosspost-killing filter, for trn, would be

/,[^,]*,[^,]*,/Hnewsgroups:j

(which means "three commas separated by any number of things other than commas") or

/:[^:]*:[^:]*:[^:]*:[^:]*:/Hxref:j

("five colons separated by any number of things other than colons").

So an even faster flavor of the filters above would use "+" instead of "*"

/,[^,]+,[^,]+,/Hnewsgroups:j

/:[^:]+:[^:]+:[^:]+:[^:]+:/Hxref:j

to avoid checking for zero-character strings, since it is rare that a message header is so malformed that nothing appears between two commas or colons. This is the speediest version of the crosspost-killer that I know. Based on my informal experiments, however, the single most important thing is just to use "[^,]" or "[^:]" instead of the period.

(Some versions of the above advice recommend another "[^,]+" or "[^:]+" prior to the first comma or colon. In my experiments with MacSOUP, that actually makes the filter very slightly slower.)

Of course, with a newsreader other than trn, you should follow that newsreader's instructions for a regular expression filter on the Newsgroups: or Xref: line, where the regular expression is the thing that I have written between the slashes. Also, once again, in many newsreaders the proper Xref: regular expression to kill on four or more newsgroups would be

:[^:]+:[^:]+:[^:]+:

with only four colons, because the string "Xref: " would not be searched, just the rest of the line.

If you are reading news over a phone line with a fast PC, and your tolerance for crossposting is low, it probably takes much longer to fetch the headers than to match the patterns anyway; but if you are using a slower or more heavily loaded computer with a faster connection to the news server (as is often the case at universities), and/or you are killing, say, articles crossposted to ten or more newsgroups, it could become quite important to optimize the regexps for processing speed.

More powerful versions

Some newsreaders' implementations of regular expressions have bounded-repeat operators that you can use to make repetitive expressions such as the above much shorter and easier to type. They are not implemented everywhere, so I haven't used them in my examples.

If you have "scorefile" capability, you can refine your crosspost filter to not junk massively crossposted posts that you want to read. My MT-NewsWatcher scorefile takes advantage of this. If I want to junk articles that are crossposted to four or more groups unless my favorite poster, George Quimby, writes them, I can create a filter that gives anything crossposted to four or more groups a demotion of, say, 350 points, but also create another filter that credits posts with George Quimby's address in the From: line with a positive score of 400 points. So if he posted the article, the score ends up positive and I see it, but otherwise, it gets junked.

Scorefiles even allow me to be a little more daring. Sometimes an article with interesting content gets crossposted to precisely three newsgroups; it's uncommon, but not so uncommon as with four or more. If I add yet another filter that gives anything crossposted to three or more groups a smaller demotion, say 10 points, then articles crossposted to three groups will get a demotion, but the credit necessary to put them over the line and appear on my screen is not so big. If I find posts with the string "heavy quark" in the Subject: line just interesting enough to get a credit of 100 points, I'll see them if they are crossposted to three groups, but not to four--heavy quarks aren't as inherently interesting as George Quimby's posts. With a simple killfile that junked everything that matches the regexp, I probably wouldn't be willing to junk everything crossposted to three groups, but with scorefiles it becomes reasonable to do this.

Some newsreaders don't have scorefiles, but do let you combine different filters using logical operators like "and" and "or", or they let you control the priority with which filters override one another. That would let you do essentially the same things I described above, by combining different filters.

Newsreaders that keep extensive threading databases, such as trn and MacSOUP, have another feature that multiplies the power of killfile filters. You have the option of making a filter apply, not just to the article that it matches, but to every article in every sub-thread that follows up to that article. The way you do this in trn is by replacing the letter "j" in the killfile lines above with a comma ",". (Read your documentation to find out how to do it elsewhere.)

This is very useful when killfiling perennially annoying posters who tend to provoke flamewars. It is also useful when applied to massive crossposts. If the crosspost provokes a multi-group flamewar, this sort of filter will catch even the followups whose authors are smart enough to edit the Newsgroups: line. This may cut too broad a swath for some tastes; experiment until you find what you like.

6. But what about bandwidth?

You might object that merely ignoring the annoying phenomena I mentioned above out of effective existence is a dangerously defeatist thing to do. After all, don't these messages waste network resources, such as bandwidth (which is communications jargon for data-transmission capacity) and storage space, whether or not people auto-ignore them? Wouldn't it be better to elevate the level of discourse by vehement protest against these antisocial practices, thereby making the net better for everyone?

In fact, they do waste network resources, but not to the extent you'd think. Massively crossposted articles do not take up any more space, or take any more bandwidth to transmit, than articles that are posted to only one group. On the other hand, massive crossposting can sustain cascades and inflame flamewars, thereby wasting bandwidth and storage space. But keep in mind that any textual message which is not actually spammed is a drop in the ocean, a grain of sand, compared to the typical sound recording or photograph. These nontextual forms of information make up a large fraction of all Usenet traffic, and they typically propagate without massive crossposting; people put them in newsgroups devoted specially to their distribution. And, actually, most of the drain on network resources today is associated not with Usenet at all, but with the World Wide Web.

The biggest waste associated with massive crossposting is not a waste of network resources, but of readers' precious time (including connect time with service providers, which sometimes costs money). News filtering mechanisms can help eliminate that waste. An attempt to shame an annoyance off the net will, if anything, increase that waste, because it will amount to picking a fight. Usenet posters don't like to be told what to do. The cultural norms of the medium are based on a libertarian, even anarchic ideal, and on some groups, self-appointed "net cops" are more despised than flamebaiters. In this case, a technical fix is really superior to an attempt to reform the social order.

7. But what if I can't filter Usenet?

Of course, some users simply don't have the option of using filtering, because of the nature of their Usenet access.

College accounts

With academic accounts this usually isn't a problem. Most college computer accounts supply Unix shell access, and Unix shell newsreaders have always been the first ones to incorporate content-oriented features like sophisticated filtering (as opposed to user interface bells and whistles).

All modern Unix newsreaders that I know of support filtering in some form, and some support regular expressions. It's likely that at least one newsreader with killfiles will be available, and moderately likely that you can use one powerful enough to kill crossposts. Try typing "man trn" or "man strn" to find out whether these newsreaders are available, and to read the documentation if they are.

On-line services

The situation can be different if you access the Internet through a big commercial on-line service. The basic product of the big on-line services like AOL is ease of set-up, not the use of power tools. When you sign up, you get proprietary client software, which typically includes a primitive Usenet newsreader that doesn't allow anything as sophisticated as filtering. Often, it is based on the software used to read the service's internal discussion groups; but that is not entirely appropriate, since internal discussion groups are a much more controlled environment than Usenet, with correspondingly less need for sophisticated filtering. Sometimes, they don't even let you edit Usenet headers. This was probably intended to prevent mischief, but it really only makes the user more vulnerable to mischief by other people!

The on-line services enhance their client software from time to time, and they now allow the use of some third-party client software. The situation is gradually improving; the distinction between on-line services and full Internet service providers is blurring. You may be able to use a newsreader other than the one that came with your on-line service.

Reading news via an Internet connection to your computer

Today, though, the best solution is probably to get a full-fledged Internet connection, via phone-line PPP or a fancy broadband connection, from an Internet service provider (ISP). Then you can use powerful, flexible third-party Internet tools that can be obtained as commercial software, shareware or even freeware. Often, your ISP will provide you with copies of some of them when you get your account, and they are also available via FTP or Web sites (where you can usually find them with search engines). If the ones provided don't have the features you want, you can always find others.

Many people who do have PPP accounts or direct Internet connections read Usenet using the newsreading programs that came with, or are incorporated into, their Web browsers. Over the years these clients have gradually gotten some sort of filtering capability, but they usually still don't have much.

If you're dissatisfied with the newsreading features provided by your Web browser, consider using a separate, dedicated newsreader instead. Most of them can now identify URLs embedded in Usenet posts, and pass them to your Web browser when you want to follow them.

8. But what about spam?

The trouble with Usenet spam

I mentioned "spam" in a couple of places above. The term "spam" today mostly refers to junk e-mail, but historically it arose in the context of Usenet, and it still happens there.

Here, by spam I mean a message that gets posted separately by automatic means to many newsgroups, without the use of the proper crossposting mechanism. (The name is derived from an old Monty Python sketch in which the word "Spam" is shouted ad nauseam, and is not meant to cast aspersions upon the noble processed-meat product.) It is also sometimes called "EMP", short for "excessive multi-posting".

Spam is evil because, unlike crossposts, it gets sent and stored multiple times at every site. Also, if you subscribe to multiple newsgroups in which the spam appears, you will see it multiple times.

Let me be clear: I'm not talking about somebody who manually posts the same message to a few different newsgroups. That happens, and sometimes it's a minor annoyance, but the person responsible usually just doesn't know any better.

(At least one person used to regularly repost the same messages manually to dozens of newsgroups. Sometimes there were indications that he was actually typing them in over and over--they were deranged manifestoes about antigravity, Communists on the moon, cannibalistic flying-saucer gods, and psychic houseplants.)

Here, though, I'm talking about someone who posts the same message separately to a hundred or a thousand newsgroups. Usually, the culprit is an entrepreneur who hires a naïve or amoral programmer to post an advertisement all over Usenet with an automatic script.

Backup methods

Much spam is also crossposted in the usual way to ten or twenty newsgroups at a time, and this filtering method will get rid of it. It will not, however, eliminate the most evil variety of spam, in which the message is only posted to one or two newsgroups at a time. Fortunately, you can filter out much of this by other, more ad-hoc means.

For instance, some spam posts are chain letters or ads for pyramid scams, and they usually contain eye-catching phrases about making big money (or "$$$") in the Subject: line. One of the all-time classic annoyances is a seemingly eternal chain letter called "MAKE.MONEY.FAST". There are many similar variants. (If you've actually read this far, you're probably too intelligent to fall for them, so I'll omit the warning.) Searching for strings like "money" in the Subject: line can be helpful, depending on the newsgroup. If you have scorefiles, you can even let through benign posts that contain these strings in the Subject: line but are by people you like, or which contain something else in the article header that implies that they might actually be of interest.

Many other spam posts advertise pornography, cosmetic or weight-loss products, or health nostrums. So that's another potential source of search criteria.

Here's a really good one that I just figured out. Many spam posts have a non-alphanumeric character (often a dollar sign, asterisk, or exclamation point) as the first character in the line, so that they will show up at the top of an alphabetized Subject: list in a newsreader's article selector. So you can kill a lot of spam by just killing anything whose first character in the Subject: line is not a letter or number. Actually, I'd allow quotation marks (single or double) too, but show no mercy if "Re:" or "Re: " precedes the silly character (so as not to read a bunch of enraged followups). One way to do this that works in MacSOUP is to look for the regular expression

^(Re:)? *[^0-9A-Za-z"' ]

in the Subject: line.

Having gotten this to work in MacSOUP, in an earlier version of this page I blithely generalized to trn without experimentation. This regexp doesn't work in trn, because of the issue of whether the word "Subject: " is part of the Subject: line, and the lack of support for that particular use of parentheses. After much trial and error, I think that the easiest way to do this in trn is to use two filters:

/^Subject: *[^0-9A-Za-z"' ]/:j

/^Subject: Re: *[^0-9A-Za-z"' ]/:j

to handle the cases of new subjects and followups, respectively (this will kill the followups even if the newsreader never saw the original article, as long as the Subject: line hasn't been changed; you could also use the threaded followup-killing command for a more aggressive filter).

In other newsreaders, you might need to modify this regular expression in other ways. My guess is that the regular expressions contained in the trn commands (between the slashes) will nearly always work, except that if the newsreader doesn't consider "Subject: " to be part of the Subject: line, you would want to search for the two regexps

^ *[^0-9A-Za-z"' ]

^Re: *[^0-9A-Za-z"' ]

in the Subject: line. (In all of the above, I am forgiving of leading spaces, to be nice both to posters with fumbly fingers and to software that might put weird numbers of spaces after "Re:".)

Spam wastes net resources to a much greater extent than massive crossposting. However, for some of the same reasons as for crossposts, protesting spam on Usenet itself rarely does any good; also, in this case you're largely preaching to the converted.

Fortunately, because there are people working to cancel it with Usenet "cancelbots", spam probably won't provide you with a great deal of hardship. It is most visible in whimsically created alt. hierarchy groups that get essentially no other traffic. Spam and the cancels aimed at it still make up a large fraction of Usenet traffic, especially if you count by message rather than by byte, and it continues to be a huge problem for Usenet as a whole; but, because of the cancels, the end users aren't seeing much of the spam.

Junk e-mail is rapidly becoming far more annoying for individual users. Some commercial mail-reading programs now offer the equivalent of killfiles, to help you filter your mail too. But that is another story.

Last modified May 5, 2000
Home - Top Matt McIrvin mmcirvin@world.std.com