Monday, November 10, 2008

Urgent Network Maintenance Tonight

One of our main network providers (AmericanIS) will be performing maintenance on the hardware and software of the equipment to which we are directly attached. This work is to resolve an issue that was discovered after a very brief event last Sunday night.

The event last Sunday night occurred when another customer of AmericanIS produced an extreme increase in load, possibly due to a Denial of Service attack or another unusual event). This negatively affected the SUP720 Route Engine performance in the Cisco 6509's which we are connect to AmericanIS through. This ultimately caused a brief period of packet loss and high latency to our customer's whose traffic flowed through AmericanIS at the time.

The solution is to upgrade the SUP720 Engines to most powerful Route Engines available for the Cisco 6509, the SUP720-3BXL. These engines were already on hand, and this upgrade has already been performed on other routers on the AmericanIS network that your traffic may pass through to different destinations on the Internet. However, we are directly connected to the last two routers to be upgraded.

This upgrade will be performed between 22:00 and 23:59 US/Pacific Time, tonight. (06:00 to 08:00 GMT Nov 11th, 2008)

During this time, the two 6509s will be upgrade one at a time. We are connected to these two routers and using HSRP fail-over protocol, so that there will be only a momentary pause in network traffic through AmericanIS as each router is taken down for the upgrade. Except for the brief pauses, the network is expected to remain up.

This maintenance is being performed by one of our upstream providers. This work is to upgrade their network devices to which we are onnected. The time and nature of the upgrade was chosen by them and not by M5Hosting. AmericanIS has historically provided M5Hosting with very highly reliable and high performance Internet transit service. This upgrade is expected to keep that level of service.

As always your feedback and comments relating to this announcement or the work is describes is encouraged.

UPDATE @ 01:30:
The upgrade being performed by AIS on the AIS network has exceeded their announced time window. There has also been a greater impact on the service than expected starting near the end of their announced window. Their very capable Network Engineers are working diligently to complete the upgrade.
While this is work being performed on a network not in our control, I have been on-site at the data center myself for the duration of the maintenance, and will continue to be for any contingency that may require our action and to get first hand updates and to render updates to you if needed.
The original plan was for very brief interruptions to cut over from one router to another. However, there have been a few periods of packet loss and latency on their network during this maintenance. This has affected our network and service to our customers, some routes are affected less than others.

You are welcome to contact us directly if you have specific questions not otherwise addressed.

UPDATE @ 04:30:
The network upgrade is complete and has been stable for a couple of hours now.

Thursday, August 21, 2008

Urgent "Failover" to redundant core switch @ 2am

A probable bug in the Cisco IOS running on the current active layer 3 core switch is preventing us from remotely managing the device. We have exhausted the options for restoring the functionality without interrupting the traffic flow through it.
Tonight, after 02:00 PDT we will force the redundant layer 3 core switch to take over while we address the issue with the current primary switch. The cut-over to the redundant switch should only take a few seconds. If we fail-back to the current primary within a short time, it too should only take a few seconds.

Date/Time: Thursday 8/21/08 @ 02:00 US/Pacific (GMT -7)
Duration: 5 to 30 seconds (1 minute "worst case")

As always, I invite your comments on this message, or the events it describes. Your feedback helps us deliver better services to you.


Thursday, August 14, 2008

Hardware failure in shared hosting server "Witt"

The shared hosting server called suffered a disk failure. To replace the disk required a few minutes of downtime. This was performed shortly after midnight tonight. The server will be a little slow at disk intensive tasks for a few hours while the array is rebuilt and the disks are synchronized.

Update 8/14/08 17:00 PDT: Disk rebuild is complete. System IO tasks running at normal speed.

Friday, August 8, 2008

Intermittent Issues with Level 3 Communications Link

We have had a few reports, confirmed by the data center facility and our own monitoring systems, that routes through Level3 Communications were intermittently "flapping" from 13:31 to 13:58 PDT.

We have asked the data center to keep us updated with anything they may find, as they are the direct Level 3 customer and it is though the data center facility's network that we are connected to Level 3 Communications.

If you have a traceroute or MTR information which shows the issue, please send them to support at

Thanks !

Network Maintenance - 02:00 to 06:00 Aug 10th, 2008

We will be making some nifty upgrades to our network this weekend. The maintenance was previously announced here.

We will be replacing our border routers, and completing some VLAN segmentation work that remained uncompleted from a previous maintenance window.

Saturday, July 5, 2008

Network Maintenance - July 6th, 2008

We will be making some upgrades to the network tonight. This work will cause some brief interruptions between 12:01am PDT and 5:00am PDT on Sunday July 6th, 2008.

This maintenance was announced via broadcast email announcement. An archive of broadcast emails is maintained on the web.

If you did not receive this message, please check if it may have ended up in your spam folder, or subscribe yourself here

Sunday, June 29, 2008

What goes here

This blog is specifically to communicate timely system status information to customers of