Page 1 of 2 12>
Topic Options
#210417 - 2015-07-27 02:01 PM Advice on troubleshooting network issues
Robdutoit Offline
Hey THIS is FUN
***

Registered: 2012-03-27
Posts: 363
Loc: London, England
Hi guys,

Sorry for a long post, but I could do with some advice. I have spent several days in the last month troubleshooting a wierd intermittent Internet issue. I think that I have rsolved it, but would like suggestions on what further testing etc I can do.

Breakdown of Network

To give you a breakdown. Its a small primary school. 20 computers, 45 laptops and 45 Ipads more or less. Two servers for database storage, one proxy/caching/filtering server for internet access for the school.

The problem

Staff reported that the Internet was periodically slow and timing out. Not all the time though. Also sometimes the sending of email failed with the message that DNS server could contact contact servername blah blah. We have an internal mail server system.

Resolution

First thing that I did is that I connected the BT Router directly to the fibre optic modem and connected to the Internet through a direct link. Internet all working. Great, so the problem must be something on the Lan.

So after much investigation and getting nowhere, I disconnected the entire network and connected up one switch, the office computers and the teachers computers in the school. Everything else - wireless, ipads, kids computers and laptops were disconnected.

Internet and email seemed to work - so connected up the kids computers and a second switch, but left the wireless off. Internet continued to work (in the sense that nobody reported anything, but in hindsight maybe the Internet was slow), and the email worked most days, but we did have problems on two instances on different days sending emails.

So I suggested that we move forward the upgrade of the existings switches as only one switch was GB. Put in a brand new 48 port GB switch and connected kids computers and staff computers again to the new switch, but again no wireless and no ipads. Internet continued to work, but still having the odd issue with sending emails

I decided to add the wireless at that point as the staff had been without wireless for a couple of weeks by then. Immediately on connecting the wireless, the Internet went down or was incredibly slow! So I thought aha - wireless was the issue. The wireless crashed the Internet for everyone - computers wired in (no wireless) as well as anyone on the actual wireless.

However

I tested each wireless access point and disconnected the two that seemed to cause the Internet to crash completely. That left three access points. However the next morning, they still complained of sent emails bouncing back with DNS server unable to locate server blah blah and the Internet crashed again around 11am and nothing online worked.

Puzzling Factors

1. When staff and kids went home - Internet worked beautifully even with every single wireless access point on. Why would this happen?
2. The BT Router did seem more sluggish with everything connected - the less computers on, the less wap's on, the faster loading the webadmin pages on the router was. Why would this happen?
3. Even with the wireless off, while the Inernet seemed to work, we still had issues with sending the odd email - so either we had two issues - the wireless crashing the Internet and something else or we had one issue that was causing the wireless to crash the Internet and also causing problems sending emails. Did we have two separate issues that occured at the same time or did we have one issue that was causing both problems?

Final Resolution

I realised that I was too close to the problem and was chasing every red herring. So I took a step back and said - what is the problem - The Internet. So I stopped looking at the wireless and the mail server etc and started diagnostics on soley the equipment required to get the Internet to work.

Connected everything up and naturally Internet went down, so I went into diagnostic mode

I took two computers and made changes on one and compared with the other.

1. I bypassed the proxy server, local Winsdows DNS and DHCP Server - No change
2. I changed the Internet DNS server in the tcp/ip settings. It appeared that opennics DNS Server was faster and more reliable than BT's DNS Server.
3. I changed the DNS Server to the BT DNS server in tcp/ip settings. In other words, I bypassed the BT Router as the DNS Server. Changed the other computer to use Opennic. Interestingly enough the BT DNS server was faster than Opennics.

So it seemed that the actual BT Router itself was the problem as I effectively bypassed the DNS routing of the BT Router - I just used the router as a gateway.

So I swopped over with another BT Router and everything seemed to work. Pages were loading faster on the actual BT Router and the dns resolution seemed to be working as fast as opennic.

So it would appear that the BT Router had some wierd issue where if there was a lot of traffic on the Lan, this caused the BT Router to become unresponsive. Given that the BT Router had received a firmware update about a week before all the problems started and had rebooted on the very day that the problems started and given that the BT Router was having issues with dropped ports on different days - on the face of it - it would seem that all the issues were with the BT Router.

Where to now

My current issue is that by the time I had worked out that it must have been the Router all along, the kids and the staff had left - remember everything seems to work when everyone leaves the building! Although the new router does seem to be faster than the old one. It was also the last day of school and they won't be back until September! So I am unable to get the staff to check whether everything actually works until September.

I don't understand why the wireless or any Lan traffic would cause problmes for the BT Router?

I am not 100% positive that I have resolved the issue (although I now believe it is the Router), but given that the school is closed until September and the problems only occur when people are using the network - I have no idea what I can do next to determine that all is well. I intalled wireshark and could not see what to do with the program - granted I was in a rush and its not the best time to learn how to use a program that you have never used before.

Do you have any advice on whether my troubleshooting steps have correctly identified the issue give the oddness that the wireless would cause such outage, but only during the day when people were using the network. Is there anything further that I can do to investigate what is going on with this network.

I think that this is one area where my IT skills are very weak. I don't know how to monitor bandwidth traffic or identify a switch that is faulty or identify why any given Lan is slow - all these skills would have helped with troubleshooting this issue. My skills are more in Windows and Linux Servers, hardware troubleshooting of servers and computers. Obviously I can setup a network, but troubleshooting switches etc is not a skill that I have needed to use very often in the last 15 years as my clients have small networks.

I think that I need to get some network monitoring software that can alert me when there is unusual traffic or some problem.

Any Advice on network monitoring software as well as what more can I do to troubleshoot this issue. Thanks

Top
#210418 - 2015-07-28 09:27 AM Re: Advice on troubleshooting network issues [Re: Robdutoit]
Mart Moderator Offline
KiX Supporter
*****

Registered: 2002-03-27
Posts: 4672
Loc: The Netherlands
Pfieuw...talking about loooong posts

These issues are tricky to find especially if it is an intermittent issue. What to do to test if all is ok now? I'm not sure. I guess setting up some kind of process that generates traffic on the network just like it was a "normal" day would be best but I'm not sure how to set this up.

For the monitoring part we have been using OpManager by ManageEngine for years and I'm very happy with it. There are some plugins available for for example NetFlow. It can send notifications by SMS message and e-mail but we only created an e-mail profile because SMS a so pre 2005 I did set a notification sound to this one specific e-mail address on my iPhone so it only makes some noise when an e-=mail comes in from the monitoring system. No reason anymore to spend money on SMS messages.

ManageEngine: https://www.manageengine.com/
OpManager: https://www.manageengine.com/network-monitoring/
Customers: https://www.manageengine.com/network-monitoring/customers-list.html
_________________________
Mart

- Chuck Norris once sold ebay to ebay on ebay.

Top
#210419 - 2015-07-28 01:19 PM Re: Advice on troubleshooting network issues [Re: Robdutoit]
Glenn Barnas Administrator Offline
KiX Supporter
*****

Registered: 2003-01-28
Posts: 4396
Loc: New Jersey
A couple of things that I always check when working on odd performance issues:

DNS - Use an internal DNS server (Required if you use any form of dynamic hostname registration!) and DO NOT use forwarders. Forwarders are often improperly used and when deployed for primary name resolution, you then limit yourself to just those servers, and any issues with those servers has a ripple effect in your network. Use the root hints on your internal server - that's what they're for.

Use a quality, managed switch - even if it's one you own and bring on-site for troubleshooting. The diagnostic information available is priceless. (I just bought a Cisco 4506 with dual power and 144 Gig-E POE ports for $300 USD when we had issues in our office. I have one at home as well and it interfaces well with OpenNMS - a free, commercial grade network monitor application.) With the data available, you can pinpoint the port(s) where errors are occurring.

Wiring - do a visual inspection and repair any termination that isn't 100%. I once visited a client who had 3 switches, 3 Internet connections (1 per switch) and 3 NICs in their server. 54 workstations and 1 server at the site. The tech told me it wasn't possible to put more than 16 hosts on a switch, hence the "3 of everything". We came in with a Cisco managed switch, moved everything over, and - sure enough - nothing worked! He also said he had one system that took 5 minutes to boot and he could not figure it out. Looking at that PC, I saw that the network cable came out of the ceiling, down the wall, and plugged right into the back of the PC's NIC. The jacket was stripped back about an inch, so no strain relief, and it used the wrong RJ45 plug. When I snipped the end off, he freaked out, saying he had just rewired the entire building. (Uh-oh!) I re-terminated the cable with the correct RJ45 plug type, plugged it in, and the system rebooted in about 40 seconds. At that point, we did a visual inspection and found that most of the cables were poorly terminated. We spent the entire day replacing the terminations with wall jacks and patch cords.

Cabling isn't magic, but it is an art, and it's easy to do things wrong. For example, the RJ45 plugs come in two types - the common (and cheap! around 5-7 cents ea) 2-point and the less common and more expensive (85 cents each) tri-point. The Tri-Point is designed for solid connectors, with two fingers on one side and one on the other side of the conductor. It will work for stranded cable as well, but that results in an expensive job. The 2-point plugs are designed for stranded conductor wire ONLY, and using them on solid conductor WILL result in a poor connection (just two tiny points of contact) that will often become high-resistance connections if your strain-relief isn't rock solid. When it isn't, a tug on the cable can cause the wire to move, the insulation slides under one or both points, and that connection becomes poor or dead entirely. What had happened at this client was that enough bad connections was generating so much noise and retry traffic that the switch was unable to cope. After replacing the cable ends with proper terminations, we moved onto a single subnet using the new 48 and an existing 24 port switch, and eliminated 2 NICs and 2 Internet service connections.

Ideally, you use solid wire for a run from a patch panel to a wall jack, then use patch cords from the switch to patch panel and jack to computer. Those are the cables most likely to be damaged through use and thus are easy to examine and inexpensive to replace.

You seem to be on the right track for diagnosing the issue - use a single laptop (for consistent readings) and work from the Edge Router, Firewall, Switch, endpoint jack, first with no devices on the switch and then add them back. You'll need an assistant for this, but it goes quick and will help identify a specific cable or workstation that's causing the issue.

Glenn
_________________________
Actually I am a Rocket Scientist! \:D

Top
#210420 - 2015-07-28 05:57 PM Re: Advice on troubleshooting network issues [Re: Mart]
Robdutoit Offline
Hey THIS is FUN
***

Registered: 2012-03-27
Posts: 363
Loc: London, England
I am busy looking into network monitoring quite heavily at the moment. I will add OpManager to the list. I am not sure why you suggest not using sms. If the Internet goes down - the monitoring program cannot email you! So from where I am standing, SMS makes sense!
Top
#210421 - 2015-07-28 06:19 PM Re: Advice on troubleshooting network issues [Re: Glenn Barnas]
Robdutoit Offline
Hey THIS is FUN
***

Registered: 2012-03-27
Posts: 363
Loc: London, England
Hi Glenn,

Thanks for your response. Always helpful.

We have a brand new switch which I have used in several other schools so I know its not the switch.

I will query regarding the cabling as the school are having building works going on - so you may have something there. However, hopefully the BT router was the problem. The new Router does seem to be faster.

I was actually going to buy Cables to Go Patch leads for the cabinet and for the computers to replace all the existing cabling - Cables 2 Go What do you think of these cables. I had a look at the specs and they seem to higher than the average standard - ie 350Mhz, 24AWG, Gold Plating - 50µm

I will ask my cabling guy if he has anything that can test for noise on the network as a result of poor terminations. That school does have cabling that was put in some years ago that was done by a professional company - sort of a handyman's job. Not that the issue has anything to do with this I think because the only thing that was not working was the Internet. Access to the servers worked, printers worked etc and the cabling was installed years ago. It is interesting to know that poor cabling can cause that kind of problem.

With regards to the DNS - I am not sure what you are recommending here. When I started my broadband service a couple of years ago, I set it up to use the Windows DNS Servers Root Hints Servers for Internet Resolution. This worked fine for a couple of months when all of a sudden the Internet stopped working at several clients - on the same day. I changed it so that the windows server forwards to the BT router which then uses the BT Internet Routers and I have never had a problem since. So I am reluctant to change it back to using the root hints given that the Internet stopped working at several clients on the same day!

At the moment the setup that I have is the clients use the windows server as the default server. Anything that the windows Dns server cannot resolve (ie non domain computers) gets forwarded to the BT router which then connects to the ISP Dns Servers. Are you saying that you would use the internal DNS server for both internal Domain Name resolution and for Internet resolution?

I will have a look at this OpenNMS that you refer to. Maybe I can get it to work with the new switch we bought - a Cisco SLM2048PT-UK

I think my first port of call would be to install some network monitoring software to see if there is any traffic that is unusual. I am busy researching that now.

Thanks for the input. Helpful to know that I have not missed too much.

Top
#210422 - 2015-07-28 10:24 PM Re: Advice on troubleshooting network issues [Re: Robdutoit]
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
with all that I've seen, I would be amazed if it wasn't the BT's router that is acting up.

Them routers can start acting up with 1 client on the network. You should not rely on that device to do your DNS (on top of other stuff it is already doing which are on top of actually routing your traffic) but have a dedicated in house server. What comes to root hints not working at several clients, something was clearly wrong and now you can never know what caused that to fail. Switching to the router as "dns proxy" just isn't the best solution (how remote-hackable that thing is?)

for cable issues, the switch will be able to tell you bad drops with error counters. If you have bad traffic instead (flood for example), you would want to put a monitoring system in place.
_________________________
!

download KiXnet

Top
#210427 - 2015-07-29 11:05 AM Re: Advice on troubleshooting network issues [Re: Robdutoit]
Mart Moderator Offline
KiX Supporter
*****

Registered: 2002-03-27
Posts: 4672
Loc: The Netherlands
 Originally Posted By: Robdutoit

...
I am not sure why you suggest not using sms. If the Internet goes down - the monitoring program cannot email you! So from where I am standing, SMS makes sense!


We had it sending SMS messages and e-mail for some time but switched to just e-mail because SMS has a limit for the number of characters used in the message. Second issue we had was that we did not have the budget at that time to get an internal modem that used a SIM card just like a mobile phone and the same thing as a USB dongle were not that common and useable yet ( \:o must have been in the stone age. Time flies .... ) so we hooked up a mobile phone with a USB data cable. It worked but there was lots of room for improvement. It is also just a matter of personal preferences and the requirements of the client I guess. For us the e-mail notifications are enough.
_________________________
Mart

- Chuck Norris once sold ebay to ebay on ebay.

Top
#210428 - 2015-07-29 11:42 AM Re: Advice on troubleshooting network issues [Re: Mart]
Robdutoit Offline
Hey THIS is FUN
***

Registered: 2012-03-27
Posts: 363
Loc: London, England
I do agree that SMS is not suitable for everything. I would rather have email as the default, but SMS for something like the Internet - or perhaps have a setting where it sends an sms if it is unable to email. The one thing that annoys clients is printers and internet not working. Those two I would consider to be a priority! So sms would be essential if Internet went down.
Top
#210429 - 2015-07-29 11:53 AM Re: Advice on troubleshooting network issues [Re: Robdutoit]
Robdutoit Offline
Hey THIS is FUN
***

Registered: 2012-03-27
Posts: 363
Loc: London, England
What I am going to be doing over August - on my extremely long list of things to do - I will be setting up two nics in the proxy server and I will connect the BT Router up to the second nic and the the first nic up to the switch. This will address the issue where the more clients added to the network, the slower the impact on the BT Router for loading webadmin pages.

As both of you have suggested using internal servers as the DNS Servers, I will have a look into this again. But I will check on spiceworks to see if anyone can shed any light as to why several clients Internet stopped working on the same day two years ago when I was using the Server DNS root hints. Would it be something to do with windows servers not being updated regularly. At that time I did not have WSUS setup. I do now. Maybe windows updates needs to update the root hints?

Could you elaborate on why a BT Router would be affected by traffic on the Lan Lonk. As far as I understand things, the BT router is a switch in the sense that it would only respond to traffic if the traffic was meant for it - unlike a hub. It is a router in that it would only route traffic to the outside world that actually needed to be routed and as for dns resolution - it should not be that onerous a task to resolve a dozen website addresses at any given time - its not like it has to resolve 500 addresses in one minute - given the number of clients on the Lan. I just cant understand why especially the wireless would impact the BT Router so much.

If I use internal DNS on the windows server, then I presume that there is no point in getting a draytek Vigor Router? If Dns will be done by the windows Server, firewall done by the Internet Box (although the BT router by default blocks most incoming connections) and if the BT Router is connected only to the second nic on the Linux Box - then it only receives traffic that is being routed to the Internet. In that case, I may not bother with the draytek router.

Top
#210430 - 2015-07-29 12:57 PM Re: Advice on troubleshooting network issues [Re: Robdutoit]
Glenn Barnas Administrator Offline
KiX Supporter
*****

Registered: 2003-01-28
Posts: 4396
Loc: New Jersey
The DNS bug was the result of a bad hotfix, quite a long time ago. A fix was released within 24 hours, but the reputational damage was done, and the "fix" that most people implemented was to use their ISP's DNS servers in their forwarders list.

Even worse, I've seen IT staff deploy public DNS servers through their DHCP. Sometimes they include this in addition to their internal servers and sometimes instead. I know one IT guy that uses his old ISP's DNS servers at half a dozen or so remote sites of a company he supports and wondered why A) he could not join workstations to AD, and B) why he had to point to their RDS servers by IP. Just last month, a developer suggested that Google's DNS (8.8.8.8) be added to the internal network. They get DHCP from their firewall, and made a request to the ISP to add this to the DNS server list. Anyone want to guess how long it took to lose connection with AD? Or who they called, hopping mad, when "AD Infrastructure Failed"? \:D

If I recall, this bad patch was shortly after Server 2003 was released, so about 11 years ago, and affected NT, Win2K, and Win2K3 platforms. This is a long time to hold a grudge. ;\) I've successfully used Windows DNS without forwarders since first publishing an article on NT Network Basics back in 1997-98.

Glenn
_________________________
Actually I am a Rocket Scientist! \:D

Top
#210431 - 2015-07-29 01:26 PM Re: Advice on troubleshooting network issues [Re: Glenn Barnas]
Robdutoit Offline
Hey THIS is FUN
***

Registered: 2012-03-27
Posts: 363
Loc: London, England
It cannot be a result of this bad hotfix. This occurred two years ago on windows server 2008 so not relevant to what you are talking about. Also at the time, I was not updating the servers regularly - I was doing them manually and there had been no updates around that time.

I will try this again and see if I have an issue again.

I am glad to say that I am not as stupid as some people. Why the hell would anyone want to use public dns servers in dhcp - do these people not understand that active director and windows logon etc require internal DNS?

Anyway you have convinced me to have a look at using root hints again. I suspect that you do need to update windows update on servers for root hints to work all the time - its the only explanation I have for why several clients failed on the exact same time - no changes at any of the clients, no updates on the servers etc. So its the only thing I can conclude.

It will also help to reduce the load on the BT Router in the event that this was causing the problem, although not anymore because I put a new router in.

Top
#210433 - 2015-07-30 12:17 AM Re: Advice on troubleshooting network issues [Re: Robdutoit]
Glenn Barnas Administrator Offline
KiX Supporter
*****

Registered: 2003-01-28
Posts: 4396
Loc: New Jersey
Well - there was an update to the root hints - back in late 2012, I believe, but NOT updating should not have resulted in a total failure.

Yeah - it's amazing what people try to do!

Glenn
_________________________
Actually I am a Rocket Scientist! \:D

Top
#210438 - 2015-07-30 10:51 AM Re: Advice on troubleshooting network issues [Re: Glenn Barnas]
Robdutoit Offline
Hey THIS is FUN
***

Registered: 2012-03-27
Posts: 363
Loc: London, England
I know where to find you if the root hints fail. It would be interesting to see if it does it again. \:\)
Top
#210440 - 2015-07-30 01:44 PM Re: Advice on troubleshooting network issues [Re: Robdutoit]
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
Tbh, I have seen that happen several times in my years. Unlike Glenn, I am not against forwarders. If you have reliable ones they are better than not using them.

There has been several attacks on root servers and it seems to always affect DNS resolution in Europe way worse than in the US.
_________________________
!

download KiXnet

Top
#210442 - 2015-07-30 04:21 PM Re: Advice on troubleshooting network issues [Re: Lonkero]
Robdutoit Offline
Hey THIS is FUN
***

Registered: 2012-03-27
Posts: 363
Loc: London, England
Thanks Lonk. Its good to know that it has happened to someone else before. Well we will see in September what gives!
Top
#210446 - 2015-07-31 03:09 AM Re: Advice on troubleshooting network issues [Re: Robdutoit]
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
Around 2010 there seemed to be abug in windows that kept corrupting the cache and resetting the server helped just temporarily. That's about the time I started loving forwarders again.
_________________________
!

download KiXnet

Top
#210454 - 2015-07-31 04:32 PM Re: Advice on troubleshooting network issues [Re: Lonkero]
Robdutoit Offline
Hey THIS is FUN
***

Registered: 2012-03-27
Posts: 363
Loc: London, England
I have just come back from the client as I had to go there to apply the remote access vpn settings which I forgot to do. Looking through the logs, it seems that WAN is perfect and Port 4 (which is currently connected to the switch) is perfect. No drops. However ports 1-3 have experienced or two drops in the last fortnight which is really wierd considering that ports 1-3 have nothing plugged into them!

I have looked at all my other clients and none of them are experiencing drops on the other three ports. However, my other clients are using BT Hub version 5 which is integrated with the modem. This particular client cannot use the modem in that hub because its fibre to the premises so it works differently. Naturally!

I am not seeing any reason why the second router would also be showing dropped ports for ports 1-3( considering there is nothing plugged into them), but maybe the latest firmware update might have something to do with that? I will contact BT and ask them.

I think what I will do is get the BT hub connected up to the second nic in the gateway and next year replace all the BT Hubs with either Netgear or Vigor Draytek broadband routers.

Top
#210455 - 2015-07-31 07:42 PM Re: Advice on troubleshooting network issues [Re: Robdutoit]
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
surely they are not using hubs...

so, you have incoming cable coming into a router, after which you have a hub, after which you have a switch after which you have another router, after which it goes back to the switch and to your workstations?

something does not add up...


Edited by Lonkero (2015-07-31 07:46 PM)
_________________________
!

download KiXnet

Top
#210456 - 2015-07-31 08:58 PM Re: Advice on troubleshooting network issues [Re: Lonkero]
Robdutoit Offline
Hey THIS is FUN
***

Registered: 2012-03-27
Posts: 363
Loc: London, England
Sorry bad typo - The BT Hub I meant is the BT Router - there is no hub. The current setup is internet line comes into VDSL Modem which then connects to WAN port on BT Router which then connects to the Switch and then on to all the other computers.

As I said I am changing this so that it will be modem-BT Router - Linux Gateway - Switch - Computers. I did not do this originally because when I was setting up the Linux Gateway I could not get it to work with two nics.

Top
#210457 - 2015-07-31 11:01 PM Re: Advice on troubleshooting network issues [Re: Robdutoit]
Lonkero Administrator Offline
KiX Master Guru
*****

Registered: 2001-06-05
Posts: 22346
Loc: OK
That's weird. Sounds like you could ditch the bt router completely vdsl straight to your gateway. That's how I have even my home setup.
_________________________
!

download KiXnet

Top
Page 1 of 2 12>


Moderator:  Arend_, Allen, Jochen, Radimus, Glenn Barnas, ShaneEP, Ruud van Velsen, Mart 
Hop to:
Shout Box

Who's Online
2 registered (morganw, mole) and 414 anonymous users online.
Newest Members
gespanntleuchten, DaveatAdvanced, Paulo_Alves, UsTaaa, xxJJxx
17864 Registered Users

Generated in 0.074 seconds in which 0.025 seconds were spent on a total of 14 queries. Zlib compression enabled.

Search the board with:
superb Board Search
or try with google:
Google
Web kixtart.org