very strange DNS problem, (whoa , many responses) clarification

Charles Steinkuehler charles at steinkuehler.net
Thu Apr 24 15:17:09 CDT 2003


Walker (Zachary) Tippit wrote:
> I received many responses to this email, thank you
> all.  
> 
> I should clarify a couple of things -  I am the ISP in
> this situation.  The machine failing to do the lookup
> is the primary nameserver for said ISP, hosting 1200+
> domains.  My nameserver doesn't think it's doing local
> service for nbbc.edu.  Other .edu lookups work
> charmingly.  
>   So, as far as I know, the /etc/resolv.conf and
> /etc/nsswitch.conf are setup just fine.  For kicks, I
> added att.net's primary nameserver to the resolv.conf;
> It didn't change much, except now I get these results:

<snip host output>

> Without the -v switch, it just returns me to a prompt
> with no output.  Strange indeed.
> 
> I read somewhere that a named dump might be in order,
> so dump I did.  When searching through the dump, I
> found this:
> 
> 
> $ORIGIN edu.       
> [....]
>  
> nbbc    164772  IN      NS      NS1.nbbc.edu. 
> ;Cr=addtnl[192.55.83.32]
> 
>         164772  IN      NS      NS2.nbbc.edu.  
> ;Cr=addtnl [192.55.83.32]
> 
> excuse the wordwrap.  So it appears that bind knows
> where to look for this domain.  
> 
> In short, I am pretty sure this is the only domain I
> can't look up, but I am paranoid about there being
> others so I want to know why..  All of the usual
> solutions turned up nil. 

I think I may have a hint as to what's going on, and it looks like it's 
not your fault.

When doing non-recursive queries with dig (ie manually "walking" the DNS 
database starting with the top-level servers), I can properly resolve 
www.nbcc.edu.

If I simply do a recursive query with dig to my local name server, 
however, it fails on a "busy" system with lots of users, and works fine 
on my local system (I run a name server that only does recursive lookups 
for me and my wife).  This is a pretty sure sign that the problem is 
with incorrect data cached in the DNS tree.

A bit more investigation shows that the *CORRECT* IP's for the nbbc name 
serves *DIFFER* from what you've got in your named dump, above.  Output 
from my working system:

[root at basic root]# host www.nbbc.edu
www.nbbc.edu has address 207.250.169.8
[root at basic root]# dig www.nbbc.edu

; <<>> DiG 9.2.1 <<>> www.nbbc.edu
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 19749
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 4

;; QUESTION SECTION:
;www.nbbc.edu.                  IN      A

;; ANSWER SECTION:
www.nbbc.edu.           37741   IN      A       207.250.169.8

;; AUTHORITY SECTION:
nbbc.edu.               24660   IN      NS      ns1.nbbc.edu.
nbbc.edu.               24660   IN      NS      ns2.nbbc.edu.

;; ADDITIONAL SECTION:
ns1.nbbc.edu.           38029   IN      A       207.250.169.111
ns1.nbbc.edu.           38029   IN      A       207.250.169.11
ns2.nbbc.edu.           24660   IN      A       207.250.169.112
ns2.nbbc.edu.           24660   IN      A       207.250.169.12

;; Query time: 136 msec
;; SERVER: 216.171.153.129#53(216.171.153.129)
;; WHEN: Thu Apr 24 10:20:08 2003
;; MSG SIZE  rcvd: 146

Note the long timeout in your cache dump, above, and the incorrect IP 
address.

I suspect the folks at nbbc.edu moved their nameservers (or renumbered 
their whole network), but didn't remember to crank down the TTL settings 
in their zone files.  As a result, they will have intermittent name 
resolution until all the dirty information cached in DNS servers across 
the internet expires.

It also looks like one of their nameservers may be offline, adding to 
their problems:

[root at falcon named]# dig  www.nbbc.edu @207.250.169.12

; <<>> DiG 9.2.1 <<>> www.nbbc.edu @207.250.169.12
;; global options:  printcmd
;; connection timed out; no servers could be reached

-- 
Charles Steinkuehler
charles at steinkuehler.net




More information about the Kclug mailing list