Legend:
new window
outside link
tools page
glossary link
This page offers a glossary of selected terms you may hear in connection with spam. Many of these definitions were gleaned from other sources, but were rewritten by me, so any mistakes or misattributions are most likely my fault and can be reported by contacting me at .
A type of advance-fee fraud propagated via e-mail.
See also: Nigerian 419 scam, advance-fee fraud.
The department of an ISP or other network service that deals with complaints of network abuse (including spam).
Typically, although not always, they can be reached at the e-mail address "abuse@xxx.yyy" where "xxx.yyy" is the name of the domain in question. You can use a particular kind of WHOIS lookup to find out for sure.
(also AUP)
The policies published by an internet service provider (ISP) that set forth what it allows (and forbids) its customers to do with the service it provides.
Not all ISPs have a published AUP, but most serious retail ISPs do, and they also make these part of their customers’ contracts (so that the ISP has grounds to terminate service if the customer violates the AUP). AUPs generally include some provisions barring spam; some go as far as to forbid the customer using spam to promote services (like websites) offered using their facilities, even if the spam itself does not pass through their facilities.
A type of fraud (often perpetrated via e-mail) in which the victim is offered large amounts of cash with no apparent risk on his own part. Before he can get any of the money, however, the victim is required to fund bank accounts or pay advance fees that the fraudster will simply steal.
(see also Nigerian 419 scam)
The advance-fee fraudster’s pitch is calculated to appeal to the greed and ignorance of the target, and often seeks to make the target himself think that he’s doing the swindling, and not the other way round. The bite may take any of various forms:
In internet commerce, a person who receives money or other considerations in exchange for using his internet resources to direct business to other firms.
For example, if you enter into an affiliate relationship with a web merchant, you put a special link to the merchant’s website on your own site or in your outgoing e-mails; you may then receive cash, merchandise, or discounts based on the volume of sales your links have generated for the merchant. Many perfectly legitimate and respectable businesses (like amazon.com) use the affiliate model to great advantage, but spamvertisers and mainsleaze businesses often pretend to be ignorant of the spam activities of their affiliates, whom they use to amplify their mail coverage (and distribute the blame).
An alternate name, registered within DNS, for an internet host.
For example, a web server host at capuchin.monkey.foo might be assigned the alias www.monkey.foo. (and the bare domain name monkey.foo might also be set as an alias for www.monkey.foo, so that surfers who like to omit the “www.” from their URLs can still reach the main web server host). The aliases and the “real” host name should all be resolvable by DNS, and one or more of them may also appear in a reverse-DNS lookup of the IP address.
(“American Standard Code for Information Interchange”)
One of the oldest and most common character sets used in computing today (it dates from the 1960s), and the one on which internet e-mail standards are based. Any text that appears in an e-mail message must be ASCII, or else must be identified (using MIME) as being of some other character set.
See also: binary.
ASCII assigns an individual character value (or “glyph”) to each of the numeric codes 0 through 127, which makes it a “seven-bit” code (i.e., 2^7 = 128 values), although an individual ASCII character is generally stored in eight bits (one byte or octet) with the high bit masked off (set to zero). ASCII only supports the letters, numerals, and punctuation marks commonly used in American English (plus a set of basic data link and carriage control marks); in order for an e-mail message to be able to contain accented characters, foreign alphabets or glyph-sets, and other specialized characters, the message must be encoded using MIME.
(see acceptable use policy)
The IP address returned by an authoritative name server for a given host name or alias. The address returned by a local name server is usually not authoritative because it probably came from the local server’s cache and not directly from the authoritative name server (and thus might be outdated by a few minutes or hours).
Reporting spam websites usually requires you to find their authoritative addresses, since spammers can use various tricks to confuse matters.
A name server that is assigned within DNS to be responsible for knowing the IP addresses of all hosts in a given domain.
New web domains, when they are entered into DNS, must be assigned one or more authoritative name servers (usually at least two are provided, preferably in different IP blocks). These authoritative servers are where the rest of the world will be sent when looking for hosts in this new domain.
Local name servers (such as those used by the typical internet user for everyday transactions) get their info from the authoritative name servers for each domain, and usually retain this information for a time in their cache (memory) before “refreshing” it with another authoritative lookup.
If a spammer can set up his own authoritative name servers for his website domains, he can use these to rapidly change the apparent address of his websites among large "botnets" of compromised home computers (see IP rotation).
(short for “automatic responder”)
A machine or software program that indiscriminately sends automatic replies to all incoming messages.
Mail systems often use autoresponders for “vacation messages” or other out-of-office notices, mailing list management systems (and other systems that are controlled by commands sent by e-mail), or challenge-response spam filters. Generally, the autoresponders simply mail their responses back to the from-address of the incoming message, without considering whether this address may have been stolen or forged.
You should not use an autoresponder for any purpose unless absolutely necessary, because these can result in unwanted “blowback” for those whose e-mail addresses have been, eh, “borrowed” by spammers.
A form of MIME encoding that converts large blocks of possibly binary data into larger blocks of ASCII text data (using a numerical technique called “base64”) for safe transmission over e-mail systems.
Since base64 can be applied to pure text as well as binary or mixed data, and since it will disguise the text content of the data, it is often used by spammers to hide the content of their messages from lazy spam filters.
A form of content-based spam filter named for its application of the Reverend Thomas Bayes’ famous theorem of conditional probabilities. Such a filter uses Bayesian inference to determine whether or not a message may be spam.
A Bayesian filter looks at individual words (or “tokens”) in a message and “weighs” them for their frequency of appearance in spam versus their frequency of appearance in general, non-spam e-mail; when the weights of all these words are combined using the Bayes equation, the result is an estimate of the probability that a message is spam.
One distinctive feature of the Bayesian approach is that it relies on objective analysis of actual data (i.e., large, frequently-updated compendia of known spam mail) rather than on some filter-writer’s supposition of what is in spam and what isn’t, and proponents of Bayesian filters claim very effective filtering with low false positives and negatives.
Many of the filters now found on popular mail clients (such as Apple Mail) are based on Bayesian principles.
The most often cited treatise on Bayesian mail filtering (which provides a very readable overview of the technique) is Paul Graham’s “A Plan for Spam,” found at http://www.paulgraham.com/spam.html.
(also “web bug”)
A special type of hyperlink planted in an HTML-formatted spam message; it is used to signal back to the spammer which of his recipients specifically have opened or responded to the mailing. The beacon URL is designed to be undetectable to the recipient, and may require no effort on the recipient's part in order to "fire."
Beacon URLs usually take the form of invisible images or undefined anchors that have clear or encrypted data appended to them; these data will show up in the spammer’s web server logs or private databases when the link is fetched, and can be used for list laundering as well as for measuring the penetration of a spam remailer’s campaigns (so he can deliver a bigger bill to his spamvertising customer). Most web bugs use the "<IMG SRC=...>" HTML tag, since this tag causes an automatic fetch of the image each time the page is displayed (i.e., you can’t stop it being transmitted).
A typical beacon URL found in a spam message might look something like:
"<IMG SRC="http://10.10.10.10/tinypic.gif?email=you@your.isp>"
when the image is fetched by your browser or mail program, the data in the call (after the question mark) is recorded in the spammer’s web server logs.
In the context of e-mail systems, describes a block of data that should not be treated as ASCII text.
While “binary” has a specific meaning to mathematicians and computer scientists (which I won't cover here), in the world of e-mail processing it generally just means “not ASCII.” Binary data may contain byte values that are outside the legal ASCII range (i.e., greater than 127), and even the legal values the block contains may not be meant to be interpreted as ASCII text.
Binary data may represent text in some other character set besides ASCII, or they may be non-text information (e.g., pictures, MP3 files, or proprietary application documents like spreadsheets). They may also represent machine-language code (like executable programs or libraries). The basic protocols for composing and transmitting e-mail do not allow the transmission of non-ASCII data at any point; in order to include such data in e-mail messages, the data must be encoded to a text-like form using MIME.
(not to be confused with block list)
See also: whitelist.
Blacklisting is pretty ineffective at catching spam since it depends upon the from-address given by the spammer. The spammers seldom re-use these forged or stolen addresses, so adding them to a blacklist could be a big waste of time.
(from the Hollywood Western cliché in which “bad guys always wear black hats”)
Describes an internet provider that does not restrict the spamming activities of its customers, or actively colludes with them (e.g., “I had to quit using that remailing service, their hats were very black.”)
People will sometimes ask for a “hat color check” on a particular internet service; they are seeking to know whether the service has a reputation for effective spam control.
See also: white hat.
(also blocking list or DNSBL; not to be confused with blacklist)
A database of IP addresses (or sometimes web URLs) suspected of being involved in spam or other abuse; generally a blocklist is not directly used by end-users, but is instead queried by mail hosts using so-called DNSBL procedures in order to reject or tag probable spam messages at the time they are offered for delivery.
See also: how a mail host uses a block list (elsewhere on this site).
Today, blocklists form the core of the most effective ISP-based spam filtering systems; when effectively managed and properly used, they can enable ISPs to reject or detain 90% or better of incoming spam mail.
There are dozens of blocklists maintained by many organizations (and even individuals), and mail hosts can query these blocklists before allowing untrusted hosts to leave mail messages for them. If the sender’s IP appears in the blocklist, the mail host can reject delivery of the message, so that it will not appear in the recipient’s inbox. Alternatively, the mail host can simply tag the message as spam (perhaps by inserting “[SPAM]” into the subject line) so that the recipient can take action on his own.
Many of the most useful blocklists follow automated or semi-automated procedures to add addresses (e.g., mail sent to spam trap addresses), and also generally delete them from the list according to some automated procedure (e.g., after some time period free from offending mail from the address). This largely removes the elements of personal vindictiveness and human inertia from the picture.
Blocklists are somewhat controversial. Despite the fairly rigorous procedures that most blocklist operators follow for naming addresses or blocks of addresses to their lists, innocent or quasi-innocent providers’ addresses are occasionally added; bitter complaints and even lawsuits often ensue. Blocklist operators are often accused (often by spammers, no surprise) of being “vigilantes” or “censors” although it is the action of the blocklist user, rather than the blocklist operator, that results in denial of service to blocklisted addresses. Blocklist users depend upon the probity and good judgement of the blocklist operators to make sure that the blocking is legitimate.
Automated responses sent indiscriminately to innocent parties in response to incoming spam e-mails.
See also: autoresponder, challenge-response filter.
The term “blowback” describes various kinds of automatically-generated e-mail messages, including vacation messages, challenge-response notifications, MDA bounces, and other “autoresponder” mails. These messages are typically sent back automatically and immediately to the return-path address of each incoming message. In the case of spam, the return paths are almost always false, and may belong to innocent third parties who suddenly find themselves receiving dozens (or more) of mysterious responses to messages they never sent. Blowback is thus an indirect nuisance resulting from spam.
(short for 'robot')
The term "bot" has two different meanings in the context of spamming.
A person who organizes a botnet.
(short for “robot network”)
A term usually applied to a group of open proxy computers (“zombies”) used for spam transmission and related tasks. The operator of a botnet can use its services for his own spamming, or can rent them to other spammers for profit.
(more formally called non-delivery notification (NDN))
See also: reject.
In theory, the mail exchanger should reject the transfer of a message outright if it can’t be delivered (this is sometimes called an "SMTP reject"); however, some MXs simply punt this job to the MDA. Since the MDA can’t “reject” the mail (it’s already been accepted by the MX), the best it can do is to send a bounce message back to the return-path address indicating that the mail is undeliverable]. When spammers forge other people’s e-mail addresses into their messages, these people often receive hundreds or thousands of bounce messages from ill-configured mail systems (a form of blowback).
The service offered by a hosting provider who claims to be able to host websites (i.e., for spammers) that will not be shut down due to AUP violations, questionable content or use of spam marketing.
Often, the provider does not actually own the space he is selling, but may be offering space on rotating cadres of spam-friendly ISPs or on secret hosts protected by reverse web-proxy botnets.
(“Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003”)
The identifier of a bill signed into federal law by U.S. President George W. Bush in late 2003, intended to make spamming a federal crime within the United States.
See also: S.1618, sample of a spam with CAN SPAM disclaimers (elsewhere on this site).
CAN-SPAM is supposed to limit spam by making it a federal crime, but in practice it permits opt-out spamming (provided certain conditions are met), attempts to invalidate some tougher anti-spam laws already in effect in individual states, and does not effectively deal with the problem of off-shore spamming and spam website hosting. Although wags started calling it the “YOU CAN SPAM” act almost as soon as it came into effect, the law may well have had some success in curtailing spam, particularly among mainsleaze operations. Some prosecutors are relying upon more concrete indictments, such as wire fraud or computer crime, to go after some of the more recalcitrant spammers.
(“Completely Automated Public Turing test to tell Computers and Humans Apart”)
A mechanism used on many websites to ensure that users are human beings and not software “robots” bent on abuse. The website visitor is required to transcribe some text from a complex, distorted graphical image into a web form before proceeding with his business.
See also: http://www.captcha.net/ (official website for CAPTCHA)
Developed by researchers at Carnegie Mellon University, CAPTCHAs are based on the supposition that automated tools (such as spam harvesters) cannot reliably decode text from distorted images, a task that humans can usually do without much trouble (early CAPTCHA tests were rendered ineffective when attackers found means to get their programs to decode the images; modern tests apply more extensive text distortion and visual noise to make automated decoding much more difficult).
CAPTCHAs are a sort of inverse of the classic Turing test, which is designed to test whether a human can discriminate between a conversationalist who is another human and one that is a machine only in the case of the CAPTCHA, it is the machine that must do the discriminating).
CAPTCHAs are deployed to prevent automated activity on websites that are intended only for use by humans. For example, many domain registrars protect their web-based WHOIS lookups with CAPTCHAs, as do many providers of other online tools that might be used for spam harvesting or other forms of abuse. Individual users can also deploy CAPTCHAs to protect web logs or message boards from being flooded with automated traffic.
(probably from “cartoon attorney”)
A loud (but usually empty) threat of legal action made by a spammer against those who seek to limit his operations. The connotation is that a “cartoon attorney” (i.e., an imaginary one) is the only sort of legal counsel at the spammer’s disposal.
Typically, the cartooney is directed at blocklist operators that have listed the spammer’s addresses for blocking, or ISPs that have taken action to block mail from the spammer. The spammer may also target individuals who post information, allegations, or threats in public fora (web logs, usenet groups, etc.).
Most cartooneys are nothing more than crude and blustery attempts to intimidate. However, spammers do sometimes make good on their threats of litigation against individuals, particularly when they think they can deflect attention away from the their own abusive activities and onto possibly ill-advised or intemperate responses of the plaintiffs (such as when someone posts threats or exposes private information about the spammer). An excellent reason to remember my spam rules #5 and #6.
An e-mail address to which any incoming e-mails for a domain are delivered if the actual address does not exist. Commonly used with small virtual-domain setups.
For example, mail to info@foo.bar or sales@foo.bar or even nobody@foo.bar might be directed to the catchall address owner@foo.bar if these addresses were not in operation at the foo.bar domain.
While catchall addresses are a well-intentioned feature (ensuring that any mail directed to a domain can be received and directed to someone in charge, even if the address is misspelled), they cause problems when mixed with spam and other forms of mail abuse. For example, if a spammer decides to forge non-existent addresses in the foo.bar domain into his messages, then hundreds or thousands of bounces for all of these messages will be sent back to the victim's catchall address.
If you operate a private domain, you would be well advised to have your hosting or mail service disable catchall addresses unless you have a good reason to use them.
The Coalition Against Unsolicited Commercial E-mail (http://www.cauce.org/), an anti-spam organization.
(commercial electronic mail message)
A synonym for commercial e-mail (which may or may not be spam), coined in the CAN-SPAM act, and used virtually nowhere else.
A fraudulent e-mail (or postal letter, fax, etc.) that invites you to send money to people on a list, and then add your own name to the list and forward the modified message to others (so that they can send money to you).
See also: Chain letter examples #1 and #2, analysis of chain letters (all elsewhere on this site).
The typical chain letter usually promises extravagant returns to the participant (although the originators are the only ones likely to make any significant money, owing to the laws of geometric propagation and market saturation). The chain letter often uses the exchange of trivial goods for money (such as “e-books” on how to spam) as a sort of fig-leaf, although this doesn’t prevent it from being considered mail fraud if it involves postal mail at any point (e.g., for sending the funds). Considered by law enforcement folks to be a form of gambling, the chain letter is a very old scam that predates e-mail. Sometimes also known as an MMF (more information here).
(also “C/R filter”)
An “active” sort of spam filter that works by blocking unknown senders’ mail from delivery until they prove that they are well-intentioned human beings and not spammers or robots.
See also: my comments on challenge-response filters (elsewhere on this site).
Challenge-response filters have gained some traction in recent years among individuals and even a few ISPs. In the C/R scheme, e-mail from an untrusted source is temporarily withheld from delivery to the intended recipient; a “challenge” message is automatically sent back to the from-address of the message, asking for a trivial “response” (e.g., replying to the challenge message, or visiting a web link), and if such a response is received, the sender’s message (and usually any further message from the sender) is released for delivery. The C/R technique depends for its effectiveness upon the fact that most spams have invalid, stolen, or unmonitored from-addresses, and so challenging them will likely never yield a response.
Challenge-response filters may seem to work for their users, but they are a nuisance to almost everyone else (with the exception of the spammer, who suffers no more than to lose a delivery or two).
(originated among usenet anti-spammers)
A very derogatory term for a particularly obvious or maladroit spammer (usually of the “network marketing” or “MLM” variety).
The chickenboner, according to a popular imaginary word-portrait, is a “fat bald redneck” who taps away at his computer surrounded by buckets of rotting carry-out chicken bones. This term was intended to apply to the sort of stupid, inexperienced, or low-rent spammers who try to pass themselves off (falsely) as successful, wealthy, and credible business people, but it may not apply to many in the current cohort of professional spammers (who often really are successful and wealthy, and anything but stupid).
(also “closed-loop opt-in”)
Characteristic of an e-mail list that requires you first to request to be added, and then to confirm separately that you want to be added, before you can begin receiving mail.
See also: all about opt-in and opt-out (elsewhere on this site) opt-in, opt-out.
Confirmed or “closed-loop” opt-in is the best way to run an ethical mailing list, since it virtually eliminates the possibility that a recipient could be sent mail when he did not ask for it.
A spam filter that depends upon analysis of a message’s content (in the body or subject line) to determine whether the message may be spam.
See also: Bayesian filter, routing filter.
The simplest content filters may just look for certain words or patterns (like “viagra” or “S.1618,” but these are not as effective as content filters using more complex techniques such as Bayesian filtering.
(“cascading style sheet”)
A technique used in website design to provide precise control of the placement and appearance of text, images, and other elements on a web page.
Some of the more advanced spammers employ CSS tricks to obfuscate their messages or to plant web-bugs.
(Dynamic Host Configuration Protocol)
A protocol (defined in IETF RFC 2131) whereby a host computer connects to a server and receives assignment of an IP address, along with other useful items (such as the names of network gateways and name servers). DHCP automates the addition of computers (and other devices) to local networks, and makes efficient dynamic addressing possible.
See also: dynamic addressing, IETF RFC 2131
(also known as “directory harvest attack” (DHA) or “MX probe”)
A type of attack in which the spammer repeatedly attempts to deliver empty messages to large numbers of addresses within a given domain in order to learn which among these addresses are deliverable.
See also: How a dictionary attack works (elsewhere on this site).
The dictionary attack is a very-frequently used means by which spammers can collect deliverable addresses for their spam lists. The spammer can use a dictionary attack to test any addresses he has for a domain, even those he may have simply have guessed at. For example, the spammer could contact the MX host for the domain xxx.yyy and try to deliver a message to “jsmith@xxx.yyy;” he would have a fair probability of actually reaching someone at this address. It’s as though the spammer is throwing a whole “dictionary” of possible addresses at the MX host.
Typically, the dictionary attack makes itself known to end-users in the form of strangely blank messages without content or subject line.
("domain internet groper")
“dig” is a common network utility that lets users interact directly with name servers to get (very) detailed information about domains and hosts. It can also find the mail exchangers used to send mail to addresses within a domain.
See also: How to use dig (elsewhere on this site).
A mail-delivery technique used by spammers in which mail is sent directly from the spammer's computer to the recipient’s mail exchange (MX) host, bypassing any intermediate mail transfer agents.
See also: popular spammer tricks , spam transmission techniques (both elsewhere on this site).
The direct-to-MX technique allows message origins to be disguised (although not completely), and keeps the outgoing mail from being monitored or detected by the sending ISP’s mail system. Direct-to-MX mailing can be done from a simple dialup or broadband account belonging to the spammer, or by open proxy machines controlled by the spammer. The vast majority of spam is now sent via direct-to-MX, usually from open proxies.
(see domain name service)
(“DNS block list”)
A spam blocklist that can be accessed (by a mail host) with the same system calls used to contact name servers.
See also: blocklist.
The DNSBL mechanism makes it easy for mail host operators to include blocklist checking in their mail handling procedures. When the mail host queries a DNSBL for a particular host name or address, it receives not an IP address, but a coded answer resembling an IP address. The code indicates the blocklist’s opinion of the address in question, in particular whether the address is implicated in spam and should not be trusted.
Within the public internet, a group of hosts that share a common domain name, as well as common authoritative DNS and mail-exchanger services.
The term “domain” is also used within proprietary Microsoft Windows networks, but has a slightly different meaning in this case (which is beyond the scope of this glossary).
A name (e.g., “rickconner.net”) for a group of individual host computers, each of which will have a host name and possibly one or more aliases, as well as common use of authoritative name servers and mail exchangers.
The domain name is in some ways analogous to a person’s family name (i.e., it is usually the same for others in his family), while the fullly-qualified host name or alias (e.g., tiger.rickconner.net) is analogous to the person’s full name (it has a first portion to distinguish it from other hosts in the domain).
Public internet domain names must be registered by their owners with an ICANN-accredited domain registrar in order for their hosts to be included in the domain name service. Sometimes, information about spammers can be uncovered by using a WHOIS tool to look up information about their web domains.
(also DNS)
A distributed database, accessible everywhere on the public internet, that can convert or “resolve” host name references to IP addresses (or vice-versa), identify mail exchanger hosts for a domain, and provide other useful features.
DNS queries are made to a type of internet host known as a name server. Typically DNS queries are done invisibly to the end-user by his applications (web browsers, mail programs, etc.), but “manual” DNS-related tools such as nslookup or dig are often used by investigators for tracing and identifying the sources of spam messages.
A company that is accredited by ICANN to provide domain registration services for the public.
See also: looking up domain registration info for a spam domain (elsewhere on this site).
Registrars are required to adhere to the terms of an ICANN agreement regarding the process for registering domains, including the collecting and posting of registrant information, but they are otherwise free to set their own policies regarding the types of domains they offer, their prices, terms of service and payment, etc.
Domain registrars vary widely in their attitudes toward spam. Some will cancel domain registrations wherever the domains are shown to have been used in spam, while others choose not to police their customers in this fashion. Still others apparently seek out the spam trade, allowing customers to register dozens or hundreds of domains automatically and in bulk, wilfully accepting incomplete or bogus registrant info, and taking no responsibility regarding the use of these domains in spam or other forms of abuse.
(also dynamic IP)
A practice whereby internet providers assign IP addresses to users’ computers on demand, and only for a limited period of time. Dynamic addressing allows a provider to serve a large group of users with a smaller number of addresses, on the assumption that only a fraction of users will be online at any point in time. The previous practice of permanently assigning IP addresses to clients has been given the retronym static addressing.
See also: DHCP
If your ISP uses dynamic IP (as do most retail providers these days), then when you sign on to your service your computer will request assignment of an IP address (usually via the DHCP protocol). You’ll get a “lease” on an address that is good only for a limited time (say, 24 hours). Once your lease runs out, or if you sign back onto your service after having disconnected, your computer will need to ask for another address assignment. The upshot of this is that a given computer will have an IP address that changes from day to day. From the user’s point of view, this process is completely transparent and tends not to cause any problems.
Dynamic IP comes into the study of spam because most spammers now use botnets of malware-infested home computers (“zombies”) to send their mail (and perform other useful tasks for them). The majority of these botnet hosts, one imagines, are on dynamic IP networks, so their addresses are constantly changing. So, while spam filters and tracing tools can very easily detect the IP addresses from which spam originates, they cannot tie these addresses back to specific computers or users. On the other hand, many spam filters can determine whether an IP address belongs to a dynamic-IP “pool,” and thus may not belong to a bona-fide mail host (which should have a static IP address outside of the pool), and this capability is often used to spot spam attacks.
(also “mailing list”)
A message broadcast system that uses e-mail to deliver its messages; it generally consists of a closed group of members who send their messages to a central e-mail address (i.e., they “send to the list”); a computer program monitors this address and broadcasts any incoming mail (from members) out to all other members of the list.
The software programs typically used to run mailing lists include majordomo and LISTSERV, which names are sometimes seen in conjunction with mailing lists (e.g., “join the tractor-pull LISTSERV”).
A good mailing list is usually quite secure against spam, because only bona-fide members can send mail to the list, and membership usually requires a voluntary signup and verification (and may even be restricted by invitation only). It is also easy to kick miscreant members off such a list should they begin spamming. However, such lists are vulnerable to harvesting if they post publicly-accessible archives (e.g. on the web).
It is generally easier to control and moderate traffic on mailing lists than on usenet groups.
Many old-style LISTSERV/majordomo mailing lists have migrated to web-based message boards, or hybrids of mailing-list and web-board (e.g., Google Groups).
A form of MIME encoding that operates on visible portions of an e-mail header (such as an address nickname or a subject line).
See also: base64, MIME, quoted-printable.
Encoded-word encoding allows non-ASCII text to be used in address nicknames and subject lines of e-mail messages; it does this by encoding this text to a safe ASCII-like form. The feature permits address nicknames and subject lines to be rendered in non-ASCII character sets for the convenience of those who use such character sets; it also allows spammers to disguise portions of their messages during transit.
See also: base64, HTML character entities, MIME, quoted-printable, URI escape.
Escapes are often used (legitimately) to modify data that might be confusing to computers in plain form (e.g., so-called URI escapes are applied to complex URLs quoted in HTML markup). Spammers can use escapes even when they aren’t needed, in order to disguise portions of the message or the origins of their network resources.
Microsoft’s trade name for its proprietary e-mail management system, popular among large corporations or institutions that are Microsoft shops.
Just to put things in perspective, Exchange is the service that your company uses to run its e-mail system; it is not the client (program) you run on your own computer to pick up your mail. Exchange is supported primarily by Microsoft’s Outlook mail client, and by an increasing number of non-Microsoft clients as well. Exchange generally does not use open standard protocols (SMTP, POP, etc.) for internal relaying and delivery of mail, although it can be made to use SMTP to transfer messages to and from external domains.
Spam recipients who use Exchange-based mail services are at a relative disadvantage in tracing spam messages, since Exchange clients often make it difficult to see SMTP headers, and will often munge the format of messages and of complex MIME-based e-mail bodies.
A spam message that is wrongly tagged as non-spam and allowed to pass through a spam filter.
See also: false positive.
Spammers are always on the lookout for ways to make their messages pass the filters as false negatives. A low rate of false negatives is one characteristic of an effective spam filter.
A non-spam e-mail message that is wrongly tagged as a spam message by a spam filter.
See also: false negative.
False positives can be more worrisome than false negatives because they might represent important personal communications that the recipient will never see unless he carefully examines the messages trapped by his filter. Unfortunately, however, the more spam a user receives each day, the greater is the likelihood that a false-positive message could be overlooked in the noise. A low rate of false positives is a measure of the effectiveness of a spam filter.
Describes the behavior of a web-hosting botnet in rotating the apparent addresses of the spam website on an extremely frequent basis (e.g., every five minutes or more often).
See also: botnet, reverse web proxy, rotating IP, zombie.
The e-mail address that appears in the “For” clause of a routing line in the typical e-mail header; it indicates the e-mail address to which the sender intends the message to be delivered.
See also: to-address, SMTP message transmission example (elsewhere on this site).
The for-address, where present, indicates the individual e-mail address to which the message originator wished to send the message (using the SMTP “RCPT TO” command). It may be different from the to-address, particularly in spam mail (which explains why you can receive spam that does not appear to have been addressed to you). The for-address is typically not displayed to the user unless he elects to view the full mail headers for the message.
An SMTP mail header that has been manipulated by a spammer in order to disguise the message’s true origins.
Common header-forging tricks include use of phony from- and return-path addresses, bogus HELO mail host names (often simple domain names like “aol.com”), and fictitious routing lines to give a false history to the message. The vast majority of spam messages make use of forged headers, although they are explicitly prohibited by the AUPs of most ISPs (as well as by the CAN SPAM law).
The name of Matt Wright’s popular CGI-based mailback application for websites (found at http://www.formmail.com/).
“Formmail” has come to be a generic term for mailback apps in general.
The e-mail address that appears in the visible “From:” field of the customary e-mail header. It is supposed to be (but isn’t always) the e-mail address of the person who sent the message.
See also: return-path address.
The from-address (for technical reasons) need not be that of the ultimate sender of the mail; in nearly all spams, this address is absent, bogus, or stolen from an innocent party.
(or “greylisting;” less “black” than “blocklisting”)
A technique used by a mail exchanger host to defend against spam deliveries by temporarily rejecting all mail from unrecognized sources.
See also: Wikipedia article on graylisting.
As we know, spammers think that following SMTP rules is for other people; Graylisting is an elegant technique that uses their own disdain against them. It works to discourage spam deliveries at the (theoretically slight) cost of delaying honest mail.
Graylisting requires a bit of extra work by the MX (particularly for large mailing services that must coordinate among many MX hosts), but possibly not as much CPU time as would be required to process and filter the spam if it were not rejected. Graylisting can result in delays in delivery of honest mail that may be objectionable in some cases (e.g., when waiting on an e-mail confirmation of a web transaction). Still, many mail services are pleased with the performance of graylisting (as are, one imagines, their users).
Graylisting has spawned a couple of related tricks that also exploit the fact that spam mail senders don’t follow the rules (of SMTP). In the technique curiously called nolisting, the ISP deliberately sets up a dummy or non-existent MX as its primary mail host; bona-fide mailers will follow SMTP and automatically move to the secondary MX when the primary does not work; spammers (it is supposed) will move on without trying the secondary MX. On the other hand, there is evidence to suggest that spammers often go to the secondary servers first (in the hope that these will be less well-protected against abuse), so nolisting may not be as effective as first thought. Still, it is very cheap protection.
(by contrast with “SPAM”)
E-mail that is not spam (e.g., “Since using the new filter, my spam-to-ham ratio has dropped sharply.").
I don’t really like this term, since it carries the somewhat elitist implication that ham (the food) is somehow superior to SPAM (the food).
Collecting e-mail addresses for a spam mailing list.
See also: scraping, dictionary attack
Spammers employ various techniques to gather potentially-deliverable addresses for their lists: collecting them from websites or usenet groups using spambot software, testing them out on MX hosts using dictionary attacks, or simply asking users to surrender them (by offering services like electronic greeting cards, online dating, and the like).
A piece of random text appearing in the subject line or body of a spam message, designed to elude a basic form of spam filtering on outgoing mail hosts.
It used to be that outgoing mail hosts would try to detect bulk mail by distilling each outgoing message into a compact string called a “hash;” by comparing the hashes of successive messages, it could decide whether they were identical, and if there were too many identical copies of the same message submitted by a given user, the mail host could stop all deliveries for the user and end the spam run. Spammers soon figured out that by slightly altering the message (using a mutable “hashbuster” string) every few iterations, they could get around this process. Hashbusters frequently appeared in the subject lines of messages, or were appended to the ends of the message bodies. Hashbusters are no longer as common or as useful as they once were, since spammers now use direct-to-MX mailing (which does not incur hashing) rather than risk interacting with outgoing MTA hosts.
(see mail header)
(from the SMTP command of the same name)
The host name given in an SMTP transaction by a mail host that wishes to leave mail for another mail host.
SMTP does not require that the HELO be authentic or correct, and spammers invariably use bogus or forged HELO names (like “aol.com” or “BADF00”). Since the numeric IP address of the sending host is known to the receiving host (it comes from the basic IP socket transaction), it can be matched to the HELO name using DNS lookups; if the match fails, it’s pretty conclusive evidence of header forgery.
(from the spycraft term meaning a well-baited trap for enemy agents or targets)
A vulnerable host, placed on the network for the specific purpose of attracting attack (under controlled conditions) by hostile parties. The honeypot is usually used to detect and study various types of system attacks (not just spam).
See: http://www.honeypots.net/.
Honeypots are used in the study of open proxies and other spam-related phenomena; while not exactly honeypots, teergruben and spam traps are very closely related.
See also: about the 'host' command (elsewhere on this site).
The text name assigned to a host computer (e.g., “horace.rickconner.net”).
See also: domain name.
Locally (i.e. on a local area network), a host might simply be known by a single name (which can be set or retrieved on Unix systems using the "hostname" command), or example “horace.” When called from outside the local area, the computer will need to have its domain name appended (e.g., “horace.rickconner.net”). Such a “fully-qualified” host name is thus analogous to a person’s first name (which is his own) and last name (which he shares with others in his family). In addition to its “proper” host name, a host can also be called by one or more aliases.
An ISP that provides hosting services (disk space, network access, software tools, etc.) to individuals or companies that wish to operate websites.
See also: bulletproof hosting.
Most spammers use websites to collect orders and inquiries, and these usually have nothing to do with the mail service(s) that they use (or steal) to send their spam. Some hosting services have strict acceptable-use policies that prohibit customers using spam (even from outside services) to promote their sites; many other providers, however, don’t much care. Still other hosting services specialize in the spam trade (see bulletproof hosting).
(“hypertext markup language”)
The “markup language” used to create web pages that include formatted text, embedded images, client-side automation, and other features.
See also: HTML 4.0 specification at http://www.w3.org/.
Most modern e-mail client programs can also understand and render HTML-formatted mail messages, and most spams are now delivered in HTML form (which offers many advantages to the spammer, such as the use of beacon URLs).
A type of character escape used in HTML markup, in which certain unusual, non-Latin, or “reserved” characters are represented using a special notation (e.g., "<“ for the less-than sign, or ”B“ for ”B“ -- 66 being the decimal character code for ”B" in the Latin-1 character set).
Occasionally, spammers will use HTML CEs to disguise the contents of their messages during transit.
(“hypertext transfer protocol”)
The protocol (defined in IETF RFC 2616) that allows web browsers to fetch files (including HTML documents, image files, etc.) from remote servers.
See also: IETF RFC 2616.
HTTP is the everyday workhorse of the World Wide Web. Spammers use HTTP in conjunction with their HTML-format e-mails for a variety of malign purposes.
(“Internet Corporation for Assigned Names and Numbers”)
The international non-profit corporation set up to administer various technical features of the internet (including creation of top-level domains, domain registration, protocol identifiers, etc.). They are, in effect, the czars of the Internet, although they tend to delegate most of their work to businesses and private individuals.
See also: ICANN website http://www.icann.org/.
(“Internet Engineering Task Force”)
An arm of the Internet Society (ISOC) (http://www.isoc.org/) that oversees the technical development of internet services. They exert their authority through the development of requests for comment (RFCs) and standards (STDs) that define how these services are to work. Most common internet services, including SMTP (e-mail) and HTTP (web services) are defined in IETF RFCs and standards.
See also: IETF website http://www.ietf.org/.
Spam mailings in which the “pitch” is embedded in a graphic image (usually a GIF or JPEG, sometimes a PDF). It is far more difficult to extract and analyze message content from a raster image than from a simple string of text, so this technique helps spammers get their messages past content-based filters.
The images are generally embedded directly into the e-mail packet as MIME attachments, which greatly increases the size of the message packet. This technique was once widely used, particularly by stock spammers (who used a great deal of “boilerplate” text that was otherwise easily spotted and detained). With the development of spam filters incorporating efficient optical character recognition (OCR) technology, this technique seems to have gone into a hiatus.
See also: Page about stock spam with examples of image spam (elsewhere on this site).
(“internet mail access protocol;” formerly "interactive" mail access protocol)
A protocol defined in IETF RFC 3501 that can be used by mail clients to pick up e-mail messages from a mail delivery agent.
See also: IETF RFC 3501. POP.
IMAP is less commonly used than the similar post office protocol (POP3). These protocols don’t come up very often in the study of spam, because they do not figure in the propagation of spam.
(SpamCop terminology)
A party whose computers, networks, or other resources have been used without his knowledge or permission to transmit or support spam. Such parties should not be accused of spam or abuse insofar as it is possible to avoid doing so.
See also: SpamCop Wiki entry on Innocent Bystanders.
Classic examples of innocent bystanders include:
Innocent bystanders may not to be to blame for the spam, but they may often be negligent in failing to secure their computers and networks against exploitation by spammers and other crooks.
(also ISP)
A business that offers internet services (such as home internet hookups, commerce websites, etc.).
The term can also generally refer to a school, government agency, private employer, or other institution that provides such services for its members, employees, or constituents.
(internet protocol)
The set of communications procedures (officially defined in IETF RFC 791) that underlie the internet.
See also: IETF RFC 791.
IP provides for the transfer of packets of data between computers on large, heterogeneous networks; additional “higher-level” protocols (such as TCP, HTTP, and SMTP) “sit on top” of IP to define what these packets must look like and how they are to be processed once received.
A numerical address, usually rendered as a so-called “dotted quad” (e.g., 192.168.0.3) that uniquely identifies a computer (or other device) on an IP network (like the public internet). In other words, the IP address is the address at which a particular machine can be reached for IP communications.
The dotted-quad is the customary or “canonical” form for standard IPv4 addresses, although they can also be given in other numerical forms (which spammers sometimes use in order to deflect scrutiny from their resources). The new IPv6 addressing scheme uses other forms to represent its much larger addresses.
The IP address is independent of any host names or domain names that a machine might have; the linkage between addresses and names is maintained in the domain name service.
The “traditional” form of IP addressing, which uses a 32-bit register that can contain any of up to about 4,300 million possible addresses. The popularity of the internet has led to most of these addresses being “claimed,” although clever routing and address-translation techniques have generally kept the threat of “address exhaustion” at bay, buying some time for the deployment of IPv6 addressing.
A new IP addressing scheme, which uses a 128-bit register that can contain a whopping 3.4 x 10^38 possible addresses (about 8x10^28 times as many as standard IPv4). IPv6 is intended to solve the problem of address exhaustion, and also to simplify the allocation and routing of IP addresses. As yet, IPv6 is not in widespread use on the public network, but several large nations and corporations have committed to support it in the future. What impacts IPv6 will have on spamming and other forms of network abuse are as yet not fully known.
See also: http://en.wikipedia.org/wiki/Ipv6
A programming language developed by Netscape; today, it is the most popular choice for “client-side” web automation.
See also: example of spam website obfuscated using JavaScript (elsewhere on this site).
JavaScripts can be embedded in web pages (or HTML mails) to provide visual effects (like “rollover links”) or to carry out automation chores (like rendering encrypted HTML markup). Spammers frequently use JavaScripts to disguise their message contents or to prevent the recipient from investigating the message.
(also “employment scam,” “payment processing scam”)
A fraud propagated via e-mail in which the scammer offers the victim a high-paying stay-at-home job, usually as a pretext to steal from him.
See also: example of a job scam (elsewhere on this site).
The “job” offered by the fraudster usually involves “payment processing;” the victim is sent a check (supposedly a customer payment) that he is instructed to deposit in his own bank account, forwarding the bulk of the payment (minus his “processing fee”) to the “boss.” The check invariably bounces after the victim has sent the money.
(named for Joe Doll of Joe’s CyberPost, canonically the first recorded victim of the practice)
A type of vindictive attack in which a spammer sends bulk mail deliberately implicating an innocent party as the source.
Joe-job attacks are frequently launched as retaliation for real or imagined injuries (such as being exposed as a spammer, or being kicked off a web forum), and often attempt to implicate the victim as a fraud artist, thief, drug dealer, child pornographer, or the like. Spams that use stolen from-addresses are a milder form of Joe-job.
(hacker/sysadmin jargon: “luser attitude readjustment tool;” adept hackers and admins often regard simple users (“losers” or “lusers”) with contempt)
The purging of inoperative or troublesome e-mail addresses from a spam mailing list, for example by planting beacon URLs in e-mails and collecting the results from a web server log.
The connotation here is that the list launderer is interested only in removing addresses that don’t work, and not in removing recipients who don’t want the spam.
(technically, a “mail user agent”)
A software application that helps users create and send outgoing e-mail, and receive and read incoming mail.
Mail clients are end-users’ “one-stop” software interface with the world of e-mail. They range from the old TTY-based Unix clients such as Mail, Elm, or Pine, to modern PC-based programs like Eudora, Microsoft Outlook, Mozilla Thunderbird, and Apple Mail. Mail clients are now invariably included in the software shipped with new computers, and are frequently “bundled” with web browsers as part of an internet software suite. Mail clients interact with remote mail hosts in order to pick up and transmit mail on behalf of the user. Mail clients are also sometimes known technically as mail user agents or MUAs.
(also MDA)
A technical name for a mail host, one specifically intended to hold mail for pickup by mail clients. It can receive mail from other hosts via SMTP relay, but does not typically relay mail itself (instead holding it for pickup via POP3 or IMAP). MDAs may also host special capabilities such as spam filtering or virus detection.
See also: procmail.
(also MX)
A technical name (in the context of DNS) for a mail host, specifically the mail transfer agent officially designated within DNS to receive mail on behalf of users of a given domain.
An external host with mail to send to a user in a given domain will use DNS to find the MX hosts for the domain, and will then deposit the mail with one of these MXs. For example, when you send mail to frank16@big-isp.foo, your mail host will use DNS to look up the MX record for “big-isp.foo” (which might, for instance, be "mx1.somewhere-else.foo”) and forward the mail to that host.
The portion of a complete SMTP e-mail message packet that appears before the body, and that contains detailed technical information about the message, including routing lines.
See also: SMTP, exposing mail headers (elsewhere on this site).
The familiar lines showing the to-address and from-address of the message, its subject line, and its date, are technically not part of the SMTP header (as properly understood); they are easy to forge and do not contain any trustworthy information that can be used in spam tracing. The actual SMTP headers are normally hidden from users by their mail clients. Tracing the origins of spam requires exposing and analyzing the information in the SMTP header.
A computer on the internet that is dedicated to transferring e-mail from senders to recipients. Mail hosts usually must be further identified as to the specific role they play in the mail process.
See also: mail exchanger, mail transfer agent, mail delivery agent, mail client.
In the most general case, the sender of an e-mail message usually transfers it from his computer (using his mail client) to a mail transfer agent (MTA) within his provider's domain. This MTA will then transfer the message to a mail exchanger (MX) for the recipient’s domain, identified via a DNS lookup. The MX usually passes the message to a mail delivery agent (MDA) for pickup by the recipient (via her computer and mail client).
(synonym for mail host)
(also MTA)
A technical synonym for mail host. The mail transfer agent is understood to be a host that does SMTP mail transfers to and from other MTAs (as well as from mail user agents, to mail exchangers, and to mail delivery agents).
A technical synonym for mail client.
A web automation program designed to collect messages from website visitors (via an HTML form) and deliver them privately to the webmaster via e-mail.
See also: formmail, example of mailback spam, using mailback scripts to avoid spam (both elsewhere on this site).
Formmail is one of the more popular of such programs, although there are many others. Mailback scripts can be used by website proprietors to get feedback from visitors without exposing an e-mail address to spambots, but misconfigured mailback scripts can also be exploited by spammers to send spam untraceably.
(wordplay on “mainstream” and “sleazy”)
A term used (by many anti-spammers) to describe an otherwise-respectable or “mainstream” company that uses spam in its advertising (and thus becomes a “sleazy” spamvertiser).
The stereotypical mainsleaze company claims no responsibility for mailings sent in its name, and does not respond meaningfully to spam complaints, preferring instead to pass the buck to the remailers with which it has contracted to distribute the spam. Naturally, it does not directly involve itself in controlling the behavior of these remailers, maintaining a degree of “plausible deniability” with regard to the spam.
Many companies, including some very famous mail-order merchants and speciality retailers, have fallen into this trap over the years, although there is evidence to show that the practice has retreated in the face of anti-spam publicity and legislation like CAN SPAM.
(see mail delivery agent)
(short for “mail filter”)
Extension software for mail filtering, added to a mail transfer agent (typically sendmail or postfix). Many milters are available to assist with blocking or filtering spam.
See also: Wikipedia entry for milter.
(“multipurpose internet mail extensions”)
A group of standards, defined in IETF RFC 2045 and associated documents, that define how data can be included in e-mails when they are not necessarily text in the ASCII character set (e.g., foreign language text, binary attachments).
See also: base64, encoded-word, quoted-printable, IETF RFC 2045.
MIME is a necessary update to the SMTP-based mail system because SMTP does not allow non-ASCII data to appear anywhere in e-mail messages.
MIME defines both (1) the structure of the e-mail body and (2) the methods used to encode the data in the body to make it safe for transmission. MIME allows one or more “chunks” of possibly related data (plain or formatted text, HTML markup, attached images and files) to be inserted into the body of a message packet. The forms of MIME encoding most often seen in conjunction with spam are base64, encoded-word encoding, and quoted-printable encoding; spammers abuse these techniques to disguise their messages during transit.