A simple trick that makes Usenet more pleasant,
or: How to killfile massive crossposts
(and related commentary on Usenet technology and culture)
By Matt McIrvin
Contents
- Filtering
- Crossposted to hell and back
- Examples of simple filtering
- Regular expressions
- The trick
- But what about bandwidth?
- But what if I can't filter Usenet?
- But what about spam?
1. Filtering
When Usenet was invented in the late seventies, it was assumed
that the volume of posts would be fairly low, and that any
subscriber to a newsgroup would want to read everything in the
newsgroup. Early news-reading software had no capability for
deliberately ignoring things.
Since then, and especially in the last few years, the volume of
posts on Usenet has greatly increased, and with it the volume of
aggravating, hostile, and off-topic posts. (I use "Usenet"
colloquially to mean net news in general, including things that
aren't technically part of Usenet, such as the alt. hierarchy.) You
can complain about this on Usenet, but it's
unlikely to do any good, and the complaining is usually as
tiresome as the annoying posts themselves. Alternatively, most
newsreaders allow you to pick out the articles you want to read by
their subject and author information, but that's still a
time-consuming manual process. It's better to combine this with
some sort of automatic filtering mechanism to screen out things
that are going to annoy you before you ever see them, or even to
pick out the wheat from the chaff and ignore everything else by
default.
Historically, on Unix systems, automatic filtering actually
appeared before "point-and-shoot" newsreaders; rn, which had no
screen-based article selector, could do simple filtering.
Unfortunately, today it's considered an advanced capability for
Net gurus, even though it's not really that hard to use. The first
generation of graphical PC and Mac newsreaders designed to run over
SLIP/PPP connections often didn't have automatic filtering at all.
Also, many people now use World Wide Web browsers as newsreaders,
and they usually have little or no filtering capability.
Therefore, many Usenet users either don't
have newsreaders with this kind of capability, or don't know
how to use it. It's worth closely studying your newsreader's
documentation to see if this is available; the capability is often
referred to as "killfiles," "scorefiles," "filters," or "memorized
commands."
The Unix newsreaders nn, trn, and strn have filters of varying
power. Some newsreaders for the Macintosh that provide flexible
filtering are YA-NewsWatcher, MT-NewsWatcher, and MacSOUP.
Commercial newsreaders for Windows that have some sort of filtering
are Gravity and Forte Agent. Of all of these, I think that nn is
the only one that can't do the trick I describe below, though I'm
not sure about Agent.
Usually, filtering features allow you to instruct the newsreader
to search lines in the article header, such as the From: or
Subject: lines, for certain patterns. When a pattern is found, they
can "junk" (that is, not display) matching articles, or
automatically display them if the search criterion is for something
you do want to look at. "Scorefiles," as implemented in
strn and YA- and MT-NewsWatcher, give you finer control: multiple
search criteria can be used to adjust a numerical score for each
article, with only articles above a certain minimum score appearing
on your screen.
It's usually not hard to figure out how to do simple things,
such as junking all articles by a certain author or with a certain
subject, using the documentation that comes with your newsreader.
More complicated tricks, though, are not obvious even if you know
all of the commands. This is because their efficacy depends not
only on the mechanics of the newsreader, but on the culture of
Usenet. In order to program a machine to seek out and junk
nuisances, you have to know their identifying marks.
Contents
2. Crossposted to hell and back
One of the most common Usenet annoyances is really quite easy to
spot: it is the article that is, to use the traditional phrase,
"crossposted to hell and back." The groups to which an article gets
posted are listed in a line in the article's header, which begins
with the word "Newsgroups:". You can "crosspost" an article to
multiple newsgroups by putting the names of all the groups in the
Newsgroups: line, separated by commas. The article only gets sent
once to each site, but readers of all of the groups in the list
will see it. Followups to the article (that is, replies that are
posted rather than sent as e-mail) will also appear in all of the
groups, unless the Followup-To: line in the header says otherwise,
or the responding poster edits the Newsgroups: line in the
followup.
This holds great potential for mischief, intentional and
unintentional. It's always a good idea to check and, if necessary,
edit the Newsgroups: line when composing a followup. However, not
everybody does this, and with some crudely written news software it
isn't even possible. If a dozen newsgroups appear in the
Newsgroups: line, the article will appear in all of them, and so
will many of the replies.
If an inflammatory article gets posted to all of those groups,
the resulting shouting match is likely to show up in many places.
Readers who don't understand crossposting will complain about the
article's appearance in "this group" in a response that gets
crossposted everywhere, without even specifying what "this group"
is. Readers will assume that all participants in the argument have
read articles that they probably haven't, causing more friction. To
make matters worse, sometimes the Followup-To: line in a
"flamebait" post is maliciously designed to send followups to
different, inappropriate groups, if the respondents don't edit
their Newsgroups: lines. The resulting confusion can plunge several
newsgroups into temporary chaos: a multi-group "flamewar."
Another annoying phenomenon is the "cascade," a thread in which
every poster quotes the entire preceding message and adds one line.
By now, Usenet veterans find these extremely tiresome, and if they
appear in a single group, they don't last long. But if they're
massively crossposted, there will always be new readers who have
never seen one before and think they're cute, so the stupid thing
propagates for ages.
(You might worry that I'm giving people instructions for social
disruption here. Actually, you can see all of these phenomena in
action just by reading Usenet for a week or two, and people fond of
causing chaos find out about them rapidly. Anyway, if more people
did what I describe below, it wouldn't even be an issue.)
There are other annoying articles that sometimes get massively
crossposted. When advertisements are "spammed"
to hundreds of newsgroups, they are sometimes crossposted to ten or
twenty newsgroups at a time. Conspiracy theorists,
pseudoscientists, and religious fanatics often crosspost their
ramblings to dozens of newsgroups so as to give their
IMPORTANT information the widest possible
distribution.
As a general rule, articles crossposted to more than two or
three newsgroups are almost never of interest to anybody.
Information of sufficiently general interest to be on topic in many
groups, such as a FAQ for a whole newsgroup hierarchy, is better
off being posted to one or two groups of more general scope, or
made available via FTP, gopher, or the World Wide Web; since, most
likely, only a minority of the population in any given newsgroup
will actually want to see it. FAQs of this sort are gradually
moving to the Web.
Massive crossposting is, therefore, an identifying
characteristic of a nuisance post. You can often set your
newsreader to junk these posts automatically, by using its
filtering capabilities.
(Indeed, if all they want to do is junk massive crossposts,
users of the commercial Windows newsreader called Gravity can stop
reading right now. There's an easy-to-use option that does it
automatically. This is fortunate, since Gravity's filters can't
otherwise access the necessary header lines.)
Contents
3. Examples of simple filtering
In what follows, I'll use trn killfile entries as examples,
because a trn killfile is entirely textual (it's literally a text
file called KILL in a directory corresponding to the newsgroup), so
it's easy to describe the commands in a text document. (This isn't
intended as comprehensive documentation for trn killfiles; for
that, type "man trn" at the Unix command line on a system that has
trn. If you want to use trn and are completely baffled by the
following, you might want to read this
introduction to trn.) Other newsreaders use different methods,
but it's usually easy to figure out how to use them from the
documentation.
It's usually easy to tell a newsreader to look for a particular
pattern in some header line. For instance, in a trn killfile, the
line
/McIrvin/f:j
will search the From: line for the word
"McIrvin," and junk everything containing that name. It will
therefore junk all articles that Matt McIrvin wrote.
If I were using trn and wanted to search the Newsgroups: line in
the header to junk all articles crossposted to alt.my.head.hurts, I
could put a line in the killfile that reads
/alt.my.head.hurts/Hnewsgroups:j
and that would do the trick. However, if there
is a separate news server, it might slow things down, because,
unlike the Subject: and From: lines, Newsgroups: is usually not
fetched from the news server at article-selection time by default.
(Indeed, nn can only filter on the Subject: and From: lines, which
is why it can't do the trick described below. As a consolation, it
can apply filters to those header lines at incredible speed.)
It's faster to search the Xref: line, if you can. (It is not
always possible.) This is a header line which is locally generated
at your site, that gives the locally available groups to which the
article is posted and its local article number within each
group.
(A brief note of technical bafflement, which you can safely
ignore: On some news servers, this line is fetched along with the
Subject: and From: lines with something called the XOVER command,
and the Newsgroups: line usually is not. That's the explanation I
hear most often for why Xref: is faster. On the other hand, I know
for a fact that it's also faster to search Xref: on the news server
that I use, which does not send Xref: via XOVER (it's
faster even when I am explicitly not using XOVER), so this
is still a mystery to me.)
Anyway, in trn,
/alt.my.head.hurts/Hxref:j
would junk all of the articles mentioning
alt.my.head.hurts in their Xref: lines. But how do I do something
more complicated? If I want to junk any article crossposted to,
say, four or more newsgroups no matter what they are, that's not a
simple string search.
Contents
4. Regular expressions
Many newsreaders support searching for "regular expressions."
These are search patterns that are more flexible than simple search
strings. They're a little like the "wild card" patterns available
in DOS and Unix, but they're different, and provide finer control.
You should consult your documentation to find out if you can use
"regexps," as they are called, and to get more details (the command
"man 5 regexp" will provide a full explanation on a Unix system,
but note that some newsreaders only partially implement regular
expressions--the regexps supported by trn seem to be the ones
described in the ed manual, accessed via "man ed" on Unix).
The following is not a complete guide to regular expressions,
but is more than enough to understand the rest of this essay, and
to do many other things with killfiles that you can't do with
simple string matching. (If your eyes glaze over, don't worry; I'll
repeat the important ones later on.)
In a regular expression,
- Most characters just represent themselves.
- The period "." means "any character". "M.tt" would match "Matt
McIrvin", "Mott scattering", or "Catcher's Mitts".
- "*" means "zero or more instances of the preceding," "+" means
"one or more instances," and "?" means "zero or one instance of the
preceding." "Ma*tt" would match "Matt McIrvin", "Come home,
Maaaaatt", or "Mtt", but not "Catcher's Mitts".
- The square brackets "[]" can be used to indicate sets of
possible characters. Between them:
- "[james]" would match any one of the letters in "james".
- "-" indicates ranges: "[0-9]" matches any digit.
- If the first character is "^", it means "any character except".
"[^fqt-z]" matches any character except f, q, or a letter
from t to z.
- Usually "]", "-" and "^" are the only characters with special
meanings. You can make them just stand for themselves by putting
them in places in the list where their functions would make no
sense: e.g. "[]^-]" matches literal "]", "^", or "-", because "]"
is at the beginning, "^" is not at the beginning, and "-"
is at the beginning or end of the list.
- Some versions of regexps let you do case-insensitive searches,
but others don't. If not, you could use the square brackets to
include alternatives. "[Mm][Aa][Tt][Tt]" would match "Matt",
"MATT", or "mAtT".
- Outside of square brackets, there are other things with special
meanings, but when "\" precedes a character with a special meaning,
it just stands for itself instead. So the regexp "\$\$\$" would
match "EARN $$$ FAST" even though the "$" usually has a special
meaning.
- Normally, a regexp will match a pattern if it appears anywhere
in the string being searched. However, you can "anchor" search
patterns to the beginning of a string with "^" or to the end with
"$", so "^Matt" matches "Matt" only at the beginning of a string,
and "Matt$" matches "Matt" only at the end. "^Matt$" would match
only a string consisting of the single word "Matt".
- (In this context, it's important to know whether your software
considers, say, the characters "Subject: " to be part of the
Subject: line. Trn does, but some other newsreaders don't! Check
your documentation to make sure, or, if that doesn't work,
experiment.)
- In some versions of regular expressions, you can group
an expression with parentheses to treat it as if it were a single
character: "The Great (Matt)+ McIrvin" would match "The Great
MattMattMatt McIrvin". Another sometimes-supported feature is the
ability to use the vertical bar between parentheses to indicate
alternative subexpressions: "M(att|egan) McIrvin" would match my
name or my sister's.
As far as I can tell, no two versions of regexps are quite the
same. Most of the features above will often be supported, and there
are usually others as well.
Contents
5. The trick (in several flavors)
The basic trick
Recall:
- The character "." means "any character".
- "*" means "zero or more instances of the preceding".
So ".*" would mean "any string of characters".
Given that, you can probably think of one way to junk massively
crossposted articles. The names of newsgroups in the Newsgroups:
line are separated by commas, so you could search for blocks of
arbitrary characters separated by commas in the Newsgroups: line.
For instance, in a trn killfile,
/,.*,.*,/Hnewsgroups:j
would junk all articles with three commas
somewhere in the Newsgroups: line, that is, all articles
crossposted to four or more newsgroups.
But, as I said, it is more efficient to search the Xref: line,
if it is possible to do so. Each entry in the Xref: line is a
newsgroup's name followed by a colon and a number (the local
article number). Since each entry has a colon, you can just search
for colons:
/:.*:.*:.*:.*:/Hxref:j
would junk all articles crossposted to four or
more newsgroups. There are five colons rather than three.
Why? One of the extra colons is there because the colons are
contained within the entries, rather than between them.
The other one is there because trn considers the Xref: line to
include the string "Xref: ", which has a colon in it! Many other
newsreaders look only at what follows the prefix of a header line,
so you would have them look for the regular expression
:.*:.*:.*:
instead, with only four colons, if you wanted
to junk articles crossposted to four or more groups.
However you filter massive crossposts, you will probably find
that an extraordinary amount of garbage goes down the drain. In
many newsgroups, a large fraction of the total traffic simply
disappears, and it is usually a fraction that includes little or
nothing of interest and much that is annoying.
Contents
Faster versions
I have since read, in Terje Bless's YA-NewsWatcher FAQ and
Stefan Haller's MacSOUP manual, that to match this sort of regular
expression more efficiently, it is better to specify that the
strings of arbitrary characters can contain any character other
than a comma or colon, as the case may be. That way, there are
fewer ways that the regular expression could possibly match the
string; the algorithm doesn't have to back up and try lots of
different possibilities, so the matching takes fewer processor
cycles. The difference could be very large if the number of
crossposted groups you can tolerate is large.
It's not hard to do this with regular expressions, though the
resulting expressions look a little more cryptic. Recall that
- Square brackets can be used to indicate sets of possible
characters.
- If the first character between square brackets is a caret, it
means "any character except".
"[^,]" is "any character except a comma", and "[^:]" is "any
character except a colon". So a faster version of the
crosspost-killing filter, for trn, would be
/,[^,]*,[^,]*,/Hnewsgroups:j
(which means "three commas separated by any
number of things other than commas") or
/:[^:]*:[^:]*:[^:]*:[^:]*:/Hxref:j
("five colons separated by any number of
things other than colons").
- "+" means "one or more instances of the preceding".
So an even faster flavor of the filters above would use "+"
instead of "*"
/,[^,]+,[^,]+,/Hnewsgroups:j
/:[^:]+:[^:]+:[^:]+:[^:]+:/Hxref:j
to avoid checking for zero-character strings,
since it is rare that a message header is so malformed that nothing
appears between two commas or colons. This is the speediest version
of the crosspost-killer that I know. Based on my informal
experiments, however, the single most important thing is just to
use "[^,]" or "[^:]" instead of the period.
(Some versions of the above advice recommend another "[^,]+" or
"[^:]+" prior to the first comma or colon. In my experiments with
MacSOUP, that actually makes the filter very slightly slower.)
Of course, with a newsreader other than trn, you should follow
that newsreader's instructions for a regular expression filter on
the Newsgroups: or Xref: line, where the regular expression is the
thing that I have written between the slashes. Also, once again, in
many newsreaders the proper Xref: regular expression to kill on
four or more newsgroups would be
:[^:]+:[^:]+:[^:]+:
with only four colons, because the string
"Xref: " would not be searched, just the rest of the line.
If you are reading news over a phone line with a fast PC, and
your tolerance for crossposting is low, it probably takes much
longer to fetch the headers than to match the patterns anyway; but
if you are using a slower or more heavily loaded computer with a
faster connection to the news server (as is often the case at
universities), and/or you are killing, say, articles crossposted to
ten or more newsgroups, it could become quite important to optimize
the regexps for processing speed.
Contents
More powerful versions
Some newsreaders' implementations of regular expressions have
bounded-repeat operators that you can use to make repetitive
expressions such as the above much shorter and easier to type. They
are not implemented everywhere, so I haven't used them in my
examples.
If you have "scorefile" capability, you can refine your
crosspost filter to not junk massively crossposted posts
that you want to read. My MT-NewsWatcher scorefile takes advantage
of this. If I want to junk articles that are crossposted to four or
more groups unless my favorite poster, George Quimby, writes them,
I can create a filter that gives anything crossposted to four or
more groups a demotion of, say, 350 points, but also create another
filter that credits posts with George Quimby's address in the From:
line with a positive score of 400 points. So if he posted the
article, the score ends up positive and I see it, but otherwise, it
gets junked.
Scorefiles even allow me to be a little more daring. Sometimes
an article with interesting content gets crossposted to precisely
three newsgroups; it's uncommon, but not so uncommon as
with four or more. If I add yet another filter that gives anything
crossposted to three or more groups a smaller demotion,
say 10 points, then articles crossposted to three groups will get a
demotion, but the credit necessary to put them over the line and
appear on my screen is not so big. If I find posts with the string
"heavy quark" in the Subject: line just interesting enough to get a
credit of 100 points, I'll see them if they are crossposted to
three groups, but not to four--heavy quarks aren't as inherently
interesting as George Quimby's posts. With a simple killfile that
junked everything that matches the regexp, I probably wouldn't be
willing to junk everything crossposted to three groups, but with
scorefiles it becomes reasonable to do this.
Some newsreaders don't have scorefiles, but do let you combine
different filters using logical operators like "and" and
"or", or they let you control the priority with which filters
override one another. That would let you do essentially the same
things I described above, by combining different filters.
Newsreaders that keep extensive threading databases, such as trn
and MacSOUP, have another feature that multiplies the power of
killfile filters. You have the option of making a filter apply, not
just to the article that it matches, but to every article in every
sub-thread that follows up to that article. The way you do this in
trn is by replacing the letter "j" in the killfile lines above with
a comma ",". (Read your documentation to find out how to do it
elsewhere.)
This is very useful when killfiling perennially annoying posters
who tend to provoke flamewars. It is also useful when applied to
massive crossposts. If the crosspost provokes a multi-group
flamewar, this sort of filter will catch even the followups whose
authors are smart enough to edit the Newsgroups: line. This may cut
too broad a swath for some tastes; experiment until you find what
you like.
Contents
6. But what about bandwidth?
You might object that merely ignoring the annoying
phenomena I mentioned above out of effective existence is a
dangerously defeatist thing to do. After all, don't these messages
waste network resources, such as bandwidth (which is communications
jargon for data-transmission capacity) and storage space, whether
or not people auto-ignore them? Wouldn't it be better to elevate
the level of discourse by vehement protest against these antisocial
practices, thereby making the net better for everyone?
In fact, they do waste network resources, but not to the extent
you'd think. Massively crossposted articles do not take up any more
space, or take any more bandwidth to transmit, than articles that
are posted to only one group. On the other hand, massive
crossposting can sustain cascades and inflame flamewars, thereby
wasting bandwidth and storage space. But keep in mind that any
textual message which is not actually spammed
is a drop in the ocean, a grain of sand, compared to the typical
sound recording or photograph. These nontextual forms of
information make up a large fraction of all Usenet traffic, and
they typically propagate without massive crossposting;
people put them in newsgroups devoted specially to their
distribution. And, actually, most of the drain on network resources
today is associated not with Usenet at all, but with the World Wide
Web.
The biggest waste associated with massive crossposting is not a
waste of network resources, but of readers' precious time
(including connect time with service providers, which sometimes
costs money). News filtering mechanisms can help eliminate that
waste. An attempt to shame an annoyance off the net will, if
anything, increase that waste, because it will amount to picking a
fight. Usenet posters don't like to be told what to do. The
cultural norms of the medium are based on a libertarian, even
anarchic ideal, and on some groups, self-appointed "net cops" are
more despised than flamebaiters. In this case, a technical fix is
really superior to an attempt to reform the social order.
Contents
7. But what if I can't filter Usenet?
Of course, some users simply don't have the option of using
filtering, because of the nature of their Usenet access.
College accounts
With academic accounts this usually isn't a problem. Most
college computer accounts supply Unix shell access, and Unix shell
newsreaders have always been the first ones to incorporate
content-oriented features like sophisticated filtering (as opposed
to user interface bells and whistles).
All modern Unix newsreaders that I know of support filtering in
some form, and some support regular expressions. It's likely that
at least one newsreader with killfiles will be available, and
moderately likely that you can use one powerful enough to kill
crossposts. Try typing "man trn" or "man strn" to find out whether
these newsreaders are available, and to read the documentation if
they are.
Contents
On-line services
The situation can be different if you access the Internet
through a big commercial on-line service. The basic product of the
big on-line services like AOL is ease of set-up, not the use of
power tools. When you sign up, you get proprietary client software,
which typically includes a primitive Usenet newsreader that doesn't
allow anything as sophisticated as filtering. Often, it is based on
the software used to read the service's internal discussion groups;
but that is not entirely appropriate, since internal discussion
groups are a much more controlled environment than Usenet, with
correspondingly less need for sophisticated filtering. Sometimes,
they don't even let you edit Usenet headers. This was probably
intended to prevent mischief, but it really only makes the user
more vulnerable to mischief by other people!
The on-line services enhance their client software from time to
time, and they now allow the use of some third-party client
software. The situation is gradually improving; the distinction
between on-line services and full Internet service providers is
blurring. You may be able to use a newsreader other than the one
that came with your on-line service.
Contents
Reading news via an Internet connection to your computer
Today, though, the best solution is probably to get a
full-fledged Internet connection, via phone-line PPP or a fancy
broadband connection, from an Internet service provider (ISP). Then
you can use powerful, flexible third-party Internet tools that can
be obtained as commercial software, shareware or even freeware.
Often, your ISP will provide you with copies of some of them when
you get your account, and they are also available via FTP or Web
sites (where you can usually find them with search engines). If the
ones provided don't have the features you want, you can always find
others.
Many people who do have PPP accounts or direct Internet
connections read Usenet using the newsreading programs that came
with, or are incorporated into, their Web browsers. Over the years
these clients have gradually gotten some sort of filtering
capability, but they usually still don't have much.
If you're dissatisfied with the newsreading features provided by
your Web browser, consider using a separate, dedicated newsreader
instead. Most of them can now identify URLs embedded in Usenet
posts, and pass them to your Web browser when you want to follow
them.
Contents
8. But what about spam?
The trouble with Usenet spam
I mentioned "spam" in a couple of places above. The term "spam"
today mostly refers to junk e-mail, but historically it arose in
the context of Usenet, and it still happens there.
Here, by spam I mean a message that gets posted
separately by automatic means to many newsgroups, without the
use of the proper crossposting mechanism. (The name is derived from
an old Monty Python sketch in which the word "Spam" is shouted ad
nauseam, and is not meant to cast aspersions upon the noble
processed-meat product.) It is also sometimes called "EMP", short
for "excessive multi-posting".
Spam is evil because, unlike crossposts, it gets sent and stored
multiple times at every site. Also, if you subscribe to multiple
newsgroups in which the spam appears, you will see it multiple
times.
Let me be clear: I'm not talking about somebody who manually
posts the same message to a few different newsgroups. That happens,
and sometimes it's a minor annoyance, but the person responsible
usually just doesn't know any better.
(At least one person used to regularly repost the same messages
manually to dozens of newsgroups. Sometimes there were
indications that he was actually typing them in over and over--they
were deranged manifestoes about antigravity, Communists on the
moon, cannibalistic flying-saucer gods, and psychic
houseplants.)
Here, though, I'm talking about someone who posts the same
message separately to a hundred or a thousand
newsgroups. Usually, the culprit is an entrepreneur who hires a
naïve or amoral programmer to post an advertisement all over
Usenet with an automatic script.
Contents
Backup methods
Much spam is also crossposted in the usual way to ten
or twenty newsgroups at a time, and this filtering method will get
rid of it. It will not, however, eliminate the most evil variety of
spam, in which the message is only posted to one or two newsgroups
at a time. Fortunately, you can filter out much of this by other,
more ad-hoc means.
For instance, some spam posts are chain letters or ads for
pyramid scams, and they usually contain eye-catching phrases about
making big money (or "$$$") in the Subject: line. One of the
all-time classic annoyances is a seemingly eternal chain letter
called "MAKE.MONEY.FAST". There are many similar variants. (If
you've actually read this far, you're probably too intelligent to
fall for them, so I'll omit the warning.) Searching for strings
like "money" in the Subject: line can be helpful, depending on the
newsgroup. If you have scorefiles, you can even let through benign
posts that contain these strings in the Subject: line but are by
people you like, or which contain something else in the article
header that implies that they might actually be of interest.
Many other spam posts advertise pornography, cosmetic or
weight-loss products, or health nostrums. So that's another
potential source of search criteria.
Here's a really good one that I just figured out. Many spam
posts have a non-alphanumeric character (often a dollar sign,
asterisk, or exclamation point) as the first character in
the line, so that they will show up at the top of an alphabetized
Subject: list in a newsreader's article selector. So you can kill a
lot of spam by just killing anything whose first character in the
Subject: line is not a letter or number. Actually, I'd allow
quotation marks (single or double) too, but show no mercy if "Re:"
or "Re: " precedes the silly character (so as not to read a bunch
of enraged followups). One way to do this that works in MacSOUP is
to look for the regular expression
^(Re:)? *[^0-9A-Za-z"' ]
in the Subject: line.
Having gotten this to work in MacSOUP, in an earlier version of
this page I blithely generalized to trn without experimentation.
This regexp doesn't work in trn, because of the issue of whether
the word "Subject: " is part of the Subject: line, and the lack of
support for that particular use of parentheses. After much trial
and error, I think that the easiest way to do this in trn is to use
two filters:
/^Subject: *[^0-9A-Za-z"' ]/:j
/^Subject: Re: *[^0-9A-Za-z"' ]/:j
to handle the cases of new subjects and
followups, respectively (this will kill the followups even if the
newsreader never saw the original article, as long as the Subject:
line hasn't been changed; you could also use the threaded
followup-killing command for a more aggressive filter).
In other newsreaders, you might need to modify this regular
expression in other ways. My guess is that the regular expressions
contained in the trn commands (between the slashes) will nearly
always work, except that if the newsreader doesn't consider
"Subject: " to be part of the Subject: line, you would want to
search for the two regexps
^ *[^0-9A-Za-z"' ]
^Re: *[^0-9A-Za-z"' ]
in the Subject: line. (In all of the above, I
am forgiving of leading spaces, to be nice both to posters with
fumbly fingers and to software that might put weird numbers of
spaces after "Re:".)
Spam wastes net resources to a much greater extent than massive
crossposting. However, for some of the same
reasons as for crossposts, protesting spam on Usenet itself
rarely does any good; also, in this case you're largely preaching
to the converted.
Fortunately, because there are people working to cancel it with
Usenet "cancelbots", spam probably won't provide you with a great
deal of hardship. It is most visible in whimsically created alt.
hierarchy groups that get essentially no other traffic. Spam and
the cancels aimed at it still make up a large fraction of Usenet
traffic, especially if you count by message rather than by byte,
and it continues to be a huge problem for Usenet as a whole; but,
because of the cancels, the end users aren't seeing much of the
spam.
Junk e-mail is rapidly becoming far more annoying for individual
users. Some commercial mail-reading programs now offer the
equivalent of killfiles, to help you filter your mail too. But that
is another story.