Why you need NTP (Network Time Protocol)?

What would happen if you and your friends decide to go for a vacation and the train is scheduled at some fixed time, but none of you have a watch which shows the correct time? No one will reach at the right time. Some may even miss the train. That is why we have a standard time called UTC (Universal Time Constant) and all the clock across the world runs in sync with it. According to the timezone the time may vary but all the clocks in the same timezone (should) show the same time. Indian Standard Time (IST) is 5hrs and 30mins ahead of UTC hence all the clock in India follows UTC+5½hrs.

In Network management it is vital that all the devices follow the same time. Why? you ask, here are some reasons listed:

  • Sync your watch’s time.
  • To launch that rocket scheduled for Mars.
  • Have an online party to start.
  • Catch a flight on time.
  • Monitor a network and know when the router/switch/firewall/system broke down.

If you were to read news in a News paper and find out when that news took place you would look at the date of publishing on the News paper. We need an anchor, a constant, that we all agree to follow in order to relate to the events happening in our lives. similarly when we talk about IT infrastructure a good IT Engineer will also make sure he checks that time when a particular event happen.

Lets look at some examples.

Eg.1: 50 of the 200 clients start experiencing connectivity issue. You (the IT support guy) started receiving email and call to your desk, from the users, stating that they are unable to login to their system. Immediately you check the Monitoring system and find that out of 4 Active Directory (AD) servers one has some issue with Network Card. You quickly replace the Network card and the issue gets resolved.

In the above example the issue is very straight forward. Yet you had to relate the time of occurance of the issue (clients’ issue) with the events that took place elsewhere (AD down). In the same example if the users started calling you from 1pm and you find that the AD went down at 2pm. You can be sure that the lack of connectivity is not due to AD but something else. You see that the timing of the events will make you take a different troubleshooting path.

Eg. 2: We backup our files every night at 10pm and we let it copy all the files from one server to another over internet. Usually it takes 2-4 hours. You start the backup and go home to sleep. Next morning you get a call from office stating that they do not have any internet connectivity. Upon reaching your office you find the FTP backup was successful. How do you know find out when the Internet failed?

This is where you turn to your Monitoring tools, which can tell you which device failed by collecting information from the device database. Lets say you find the syslogs of the gateway router and start reading them line by line. You find that the router got rebooted at around 8pm the previous day which resulted in disconnection of internet. This shocks you as you know the internet was working till 10pm, as you initiated the backup only at 10pm. Only then you realize that the router has been restarting regularly and every time it restarts it resets its clock to 12am. This resulted in the wrong reporting of time by the router.

Eg. 3: A user calls to tell that he is unable to open many of the web pages. Upon investigation you find that the user is unable to open only https pages and http pages are loading fine. With this clue you now know that something relating to SSL (https uses ssl) has gone wrong. Something in the user’s computer does not get the SSL certificate to encrypt/decrypt the https traffic. Rest of the users in your network do not have any such issues. Upon investigation you find that upon restarting the system looses its time and resets the clock to 12am due to faulty CMOS battery.

This is where you change the system’s CMOS battery and reset the time and find that the issue is resolved. The user was unable to open https webpages because the SSL certificates are valid only for a certain time frame. Which means if your system’s time resets to an old time (say 2010) then the browser will not take the certificate which is valid from future (say 2016).

ntp-server-sigma-modThere are many other examples where a wrong Clock time in one or many device can result in confusion and make it difficult to find the problem. On the other hand there are many examples where a proper implementation of
NTP will help all the devices sync to a common time and help IT support guys to troubleshoot better.

Leave a Reply