From: Eliot Lear The following was written by Dr. Charles Hedrick of Rutgers University sometime in 1985. Please read it with the understanding that rule numbers are nothing more than function names. For further reference, I suggest the Sun Tutorial on Sendmail in their manuals. -eliot Command: followup Newsgroups: net.unix-wizards,net.mail To: steve@jplgodo.UUCP Subject: a brief tutorial on sendmail rules Distribution: References: <902@rlgvax.UUCP> <545@jplgodo.UUCP> A previous message suggested using "sendmail -bt" to see how sendmail is going to process an address. This is indeed a handy command for testing how an address will be processed. However the instructions given were not quite right. To see how sendmail is going to deliver mail to a given address, a reasonable thing to type is sendmail -bt 0,4 address Even this isn't quite right, but with "normal" rule sets it should work. Because there is so much confusion about sendmail rules, the rest of this message contains a brief tutorial. My own opinion of sendmail is that it is quite a good piece of work. Many people have complained about the difficulty of understanding sendmail rule sets. However I have also worked with mailers that code address processing directly into the program. I much prefer sendmail. The real problem is not with sendmail, but with the rules. The rules normally shipped from Berkeley have lots of code that does strange Berkeley-specific things, and they are not commented. Also, typical complex rule sets are trying to handle lots of things, forwarding mail among several different mail systems with incompatible addressing conventions. A rule set to handle just old-style (non-domain) UUCP mail would be very simple and easy to understand. But real rule sets are not doing simple things, so they are not simple. For those not familiar with sendmail, -bt invokes the rule tester. It lets you type a set of rule numbers and an address, and then shows you what the rules will do to that address. In addition, rule test mode automatically applies rule 3 before whatever rule you ask it to apply. As we will see shortly, this is a reasonable thing to do. Before describing the rule sets, let me define two terms: "header" and "envelope". Header refers to the lines at the beginning of the message, starting with "from:", "to:", "subject:", etc. Sendmail does process these lines. E.g. with uucp mail it will add its own host name at the beginning of the from line, so that the final recipient stands some change of replying to the message. However sendmail normally does not depend upon the from and to lines to perform its actual delivery. It has more direct knowledge, passed on to it from the program that generated the mail, or if it came from another site, the mailer at that site. This information is referred to as the "envelope", since it is like the addresses on the outside of an envelope. For Arpanet mail, the envelope is passed to the next site by the MAIL FROM: and RCPT TO: commands. For UUCP mail, it is passed on as arguments to the remote rmail command. To see why there have to be separate addresses "on the envelope", consider what happens when you send mail to "john@vax, mary@sun". Two copies of the message will be dispatched, one to vax and the other to sun. The "to: " line in the headers will show both addresses. However the envelope will show only the right address that we want this copy to go to. The copy sent to vax will show "john@vax" and the copy sent to sun will show "mary@sun". If sendmail had to look at the "to: " line, it would never know which of the addresses shown there it was responsible for handling. Anyway, here is what the rules do: 3: always done first. This turns addresses from their normal textual form into a form that the rest of the rules understand. In most cases, all it does it put < > around the name of the host that is next in line. Thus foo@bar turns into foo<@bar>. However it also does a few transformations. E.g. it turns foo!bar!user into bar!user<@foo.UUCP>. Since sendmail accepts either ! syntax or @....UUCP syntax, rule 3 standardizes on @ syntax. It also does a few other minor things. But you won't be far off if you just think of it as adding < > around the host name. 4: always done last. This turns addresses from internal form back into external form. It removes the < > around the host name, and turns foo@bar.UUCP back into bar!foo. Again, there are one or two other minor things, but you won't be too far off if you think of 4 as just removing the < > around the host name. 0: This is the rule that handles the destination address on the envelope. It is in some sense the primary rule. It returns a triple: protocol, host, user. The protocol is usually one of local, TCP, or UUCP. At the moment, it figures this out syntactically. In our rule set, hosts ending in .UUCP are handled by UUCP, the current host is local, and everything else is TCP. As domains are integrated into UUCP, obviously this rule is going to change. This rule does very little other than simply look at the format of the host name, though as usual a few other details are involved (e.g. it removes the local host. So myhost!foo!bar will be sent directly to foo). 1 and 2 are protocol-independent transformations used for sender and recipient lines in the header (i.e. from: and to: lines). In our rule sets, they don't do anything. Each protocol has its own rules to use for sender and recipient lines in the header. E.g. UUCP rules might add the local host name to the beginning of the from line and remove it from the to line. In our rule set, the complexities in these rules are primarily caused by forwarding between UUCP and TCP. The line that defines the mailer for a protocol lists the rule to use for source and recipient, in the S= and R=. Finally, here is the exact sequence in which these rules are used. For example, the first line means that the destination specified in the envelope is processed first by rule 3, then rule 0, then rule 4. envelope recipient: 3,0,4 [actually rule 4 is applied only to the user name portion of what rule 0 returns] envelope sender: 3,1,4 header recipient: 3,2,xx,4 [xx is the rule number specified in R=] header sender: 3,1,xx,4 [xx is the rule number specified in S=] I have the impression that the sender from the envelope (the return-path) may actually get processed twice, once by 3,1,4 and the second time by 3,1,xx,4. However I'm not sure about that. Now for the format of the rules themselves. I'm just going to show some examples, since sendmail comes with a reference manual, which you can refer to. However these examples are probably enough to let you understand any set of rules that makes sense in the first place (which the normal rules do not). This example is from our UUCP definition. It a simplified version of the set of rules used to process the sender specification. As such, the major thing it has to do is to add our host name to the beginning, so that the guy at the end will know that the mail went through us. S13 R$+<@$-.UUCP> $2!$1 u@host.UUCP => host!u R$=U!$+ $2 strip local name R$+ $:$U!$1 stick on our host name Briefly, the first rule turns the address from the form foo<@bar.UUCP> back into bar!foo. The second rule removes our local host name, if it happens to be there already, so we don't get it twice. The third rule adds our host name to the beginning. S13 says that this is the beginning of a new rule set, number 13. R$+<@$-.UUCP> $2!$1 u@host.UUCP => host!u R says that this is a rule. The thing immediately after it, $+<@$-.UUCP> is a pattern. If this pattern matches the address, then the rule "triggers". If the rule triggers, the address is replaced with the "right hand side", i.e. what is after the tab(s). In this rule, the right hand sie is $2!$1. The thing after the next tab(s) is a comment. This rule is used in processing UUCP addresses. As noted above, by the time we get to it, rule 3 has already been applied. So if we had a UUCP address of the form host1!host2!user, it would now be in the form host2!user<@host1.UUCP>. This does match the pattern: $+ <@$- .UUCP> host2!user<@host1.UUCP> $+ and $- are "wildcards" that match anything. $- will match exactly one word, while $+ will match any number. (By the way, with the increasing use of domains, this production should probably use $+.UUCP, not $-.UUCP.) Since the pattern matches, we replace this with the "right hand side" of the rule, $2!$1. $ followed by a digit means the Nth thing matched by a wildcard. In this case there were two wildcards, so $1 = host2!user $2 = host1 The final result is host1!host2!user As you can see, we have simply turned UUCP addresses from the format produced by rule 3 back into normal ! format. The second rule is R$=U!$+ $2 strip local name This is needed because there are situations in which our host name ends up on the beginning of the recipient address. Since we are about to add our host name, we don't want it to be there twice. So if it was there before, we remove it. $= is used to see if something is a member of a specified "class". U happens to be a list of our UUCP host name and any nicknames. So $=U!$+ matches any address that begins with our host name or nickname, then !, then anything else. Suppose we had topaz!host1!host2!user. The match would be $=U !$+ topaz!host1!host2!user The result of the match is that $1 = topaz $2 = host1!host2!user Since the right hand side of this rule is simply "$2", the result is host1!host2!user I.e. we have removed the topaz from the beginning. By the way, the class U used by the rule would have been defined earlier in the file by the statement CUtopaz ru-topaz C defines a class. U is the name of the class. The rest of the line is the list of things that will be in the class. Finally we have the rule R$+ $:$U!$1 stick on our host name The $+ matches anything. In this case the name is host1!host2!user, so the result of the match is $1 = host1!host2!user The result looks slightly obscure. $: is a tag that says to do this only once. The problem is that this rule always applies, since the pattern matches anything. Normally, rules are applied over and over, as long as they apply. In this case, the result would be an infinite loop. Putting $: at the beginning says to do it only once. $U says to use the value of the macro U. Earlier in the file we defined U as our UUCP host name, with a definition DUtopaz Note that there can be a class and a macro with the same name. $=U tests whether something is in the class U. $U is replaced by the value of the macro U. So the final value of this rule, $:$U!$1, is topaz!host1!host2!user So this rule has managed to add our host name to the beginning, as it was supposed to. Since there are no further rules in the set (the next line is the end of file or the beginning of a new rule set), this value is returned. There are several more magic things that can appear in a pattern. The most important are: $* - this is another wild card. It is similar to $+, but $+ matches anything, whereas $* matches both anything and nothing. I.e. $+ matches 1 or more tokens and $* matches 0 or more tokens. So here is a list of the wildcards I have mentioned: $* 0 or more $+ 1 or more $- exactly 1 $=x any member of class x A typical example of $* is a production where we aren't sure whether the user name is before or after the host name: R$*<@$+.UUCP>$* $@$1<@$2.UUCP>$3 This production would test for the host name ending in .UUCP, and return immediately. $@ is a flag you haven't seen yet. It is simply a return statement. It causes the right hand side of this rule to be returned as the final value of this rule set. The other magic thing I will mention is $>. This is a subroutine call. Here is an example taken from rule set 24, which is used to process recipients in TCP mail. Its purpose is to handle the situation where we might have an address like topaz!user@red. (Our host name is topaz. Red is a local host that we talk to via TCP.) I.e. someone is asking us to relay mail to red. Rule 3 will have turned this into user@red<@topaz.UUCP>. What we want to do is get rid of the topaz.UUCP and treat red as the host. (Rule set 0 would do this for the recipient on the envelope. This rule is used for the to: field in the header.) Here is the rule. R$+<@$=U.UUCP> $@$>9$1 in case local!a@b The pattern matches our example, as follows: $+ <@$=U .UUCP> user@red<@topaz.UUCP> Recall that $+ matches anything and $=U tests whether something is our UUCP host name or one of our nicknames. The result of the match is $1 = user@red $2 = topaz The right hand side is $@$>9$1. The $@ is the tag saying to stop the rule set here and return this value. $>9 is a subroutine call. It says to take the right hand side, pass it to rule set 9, and then use the value of rule set 9. The actual right hand side is simply $1, which in this case is user@red. Here is rule set 9: S9 R$*<$*>$* $1$2$3 defocus R$+ $:$>3$1 make canonical R$+ $@$>24$1 and do 24 again The first rule simply removes < >. It is sort of a quick and dirty version of rule 4. In fact we have no < > left, since we have removed the <@topaz.UUCP>. So this rule does not trigger. (Now that I think about it, I suspect it is probably never going to trigger, and so is not needed.) The next rule is a simple subroutine call. It matches anything ($+ matches any 1 or more token). The right hand side is $:$>3$1 The $: says to do it only once. Since the rule matches anything, you need this, or you will have an infinite loop. The $>3 says to call rule 3 as a subroutine. The $1 is the actual right hand side. Since the left hand side matched the whole address, what this rule does is simply call rule set 3 on the whole address. Recall that rule set 3 basically locates the host name and puts < > around it. So in this case the result is user<@red>. As you can see, it was not enough to remove <@topaz.UUCP>. That leaves us with no host name. We have to call rule 3 to find the current host name and put < > around it. The last rule is really just a goto statement. The pattern is $+, which matches anything, so it always triggers. The right hand side is $@$>24$1. The $@ is the return tag. It says to stop this rule set and return that value. $>24 says to call rule set 24. The actual right hand side is $1, so we call rule set 24 with the whole address. If you recall, this ruleset (9) was called from the middle of 24 when we found user@red<@topaz.UUCP>. So what we have done is to change this into user<@red> and say to start rule set 24 over again. I hope you have found this exposition useful. As a final convenience, here is a "reference card" for reading rule sets. Note that this contains only operators used by the rules. There are plenty of other facilities used in the configuration section which I am not documenting here. (I'd love to see someone produce a complete reference card.) wildcards: $* 0 or more tokens $+ 1 or more tokens $- exactly one token $=x member of class x (x must be a letter, lower/upper case distinct) $~x not a member of class x macro values (usable in pattern or on right hand side) $x value of macro x (x must be a letter, lower/upper case distinct) At least on the Pyramid, $x is replaced by the macro's value when the sendmail.cf file is being read in. on the right hand side: $n string matched by the Nth wildcard $>n call rule set N as a subroutine $@ return $: only do this rule once in rule 0, defining the return value $# protocol $@ host $: user Rutgers extensions, usable only on right hand side $%n take the string matched by the Nth wildcard, look it up in /etc/hosts, and if found use the primary host name $&x use the current value of macro x. x must be a letter. upper and lower case are treated as distinct.