Archive for the ‘Networking’ Category
Interesting troubleshooting with Cisco Nexus 1000V
Today, I assisted a couple of coworkers with troubleshooting a VLAN issue. The VLAN had been added to all the appropriate devices, but the VM was still unable to ping the gateway. As our troubleshooting progressed from device to device, we noticed that although the uplink port profile on the Cisco Nexus 1000V was configured correctly, the VEM on the host was not updating with the changes. This could be seen by running
vemcmd show port vlans
on the host itself. As we dug through the running config of the Cisco Nexus 1000V, we noticed that besides the port channels inheriting the uplink port profile, a VLAN was manually added to all port channels. It turns out this is a simple fix (after many hours of troubleshooting). By issuing the command
default switchport trunk allowed vlan
to all the port channels, the port channels started inheriting the VLANs from the port profile again. We verified this by adding/removing a VLAN and the port profile updated as expected. We are not sure when this VLAN was added this way, but it did give us a bit of a headache.
So remember when working on the Cisco Nexus 1000V, add/remove VLANs via the port profiles, not from the port channels or ports themselves.
ESXi 5, Nexus 1000V, Custom ISO…oh my…
So a quick post today on an issue I ran into (and was also told about by @kendrickcoleman). After building a custom VMware ESXi 5 ISO with the Cisco Nexus 1000V vib, EMC PowerPath 5.7 vibs, and the vSphere HA vib, I deployed out 3 hosts. After deployment, I noticed all three hosts had one CPU core spiked while the rest remained idle. It turns out, that once the host is added to the Cisco Nexus 1000V dVS, CPU usage returns to normal.
Just an FYI on this topic in case other people run into it.
Reconnecting ESX(i) hosts with 1000V installed
I follow a certain policy: “You break it; you fix it”. Why? When you break something, you learn a lot about why it broke and what it takes to fix it. Hopefully, you also learn how to prevent it from happening again. Two, I’m not stuck dealing with the problem someone else caused.
Tonight was one where I broke something, and now I needed to fix it.
I had to reconfigure a UCS system running 3 vSphere 4.1 ESXi hosts connected to a Cisco 1000V. Before powering down the hosts, I made some changes to the 1000v and the UCS. So far, so good. All the hosts were powered down along with vCenter and the 1000V VSMs, which were running on the UCS chassis. I made my configuration changes on the UCS and powered up the first host. No connectivity. Something was not jiving between the 1000V config on the host itself and the vNIC configuration from the UCS. It turned out to be a misconfiguration of the native vlan on the vNIC presented to the host and the native vlan configuration configured on the host by the 1000V. I returned some values to their previous settings, and the host came back up. I updated the 1000V config, which took the host back down, but I adjusted the UCS to the new configuration I wanted to bring the host back up. Great, except I still had to bring two more hosts back online.
This is where I dove into vemcmd on the ESXi host. When I brought up the next host, I opened a local console to the system. “vemcmd show port” showed me the native vlan configuration error that was causing my issue. So how do you fix this? It’s actually quite simple. When looking at the output of the previous command, you’ll notice three Trunk ports on the system. Two are the actual physical uplinks connected from the system. The third is the port channel that is formed between the two nics. To bring the system back online, you need to issue the following command: vemcmd set port-mode trunk native-vlan <native vlan> ltl <ltl of port-channel>. After issuing this command, the system was back online. You’ll notice that after issuing the command, the native vlan of the physical nics remains the same. After the VEM gets the updated configuration, the native vlan is now the correct on the physical nics.
The next chapter in my career
I do have to say, my voluntary vacation the past 6 weeks has been great. I did not travel anywhere exotic, or actually do much of anything. It was just great having downtime away from work.
Now it’s time to focus on the next chapter of my professional career, which starts on May 24th. I will be joining ACADIA as a Network Architect. I am looking forward to the opportunities and challenges that will be presented in this position and company.
I will keep this update short and head off to today’s RAoN: Running.
And a new era begins
Six months ago I decided to make career and life change and move to Fort Worth for a new job. It was a fantastic opportunity at a small company. I never envisioned being at the company for longer than two years, but I figured the opportunities for career growth were better. Surprisingly, I also did not imagine my time at the company to be so short lived. It is actually quite unfortunate, since I worked with a lot of a great of people.
There always comes a time where you have to ask yourself if you can accept and deal with the changes, or is it better to find another opportunity?
I had been fighting this for the past couple of months, and today I made the decision, for both personal and professional reasons, to move on. I had made similar decisions before, but always because I was moving to a new employer for a better opportunity. This time is different. I am just moving on. And surprisingly, I am very relaxed and happy with the decision.
So what is next for me?
The last two projects I really enjoyed were designing and architecting the server refresh at my previous company, and then the implementation of the 100% virtualized data center at my now former company. Both revolved around VMware vSphere, Cisco Nexus 5000 switches, and EMC storage. I would like to pursue a position around data center design and architecture using the same or similar technologies. Along a similar path, I also have aspirations for attaining CCIE and VCDX certifications.
And that’s a wrap. I am off to enjoy the rest of my day.
Where is it? It’s got to be here, it just has to be…
I’ve been thinking about this for a while, and since this is a new year, here is a list of 9 features or changes I would like see.
In no particular order:
9. When doing a manual VMotion, maintain the current resource pool of the VM. Only ask for a resource pool if I move the VM to a new cluster. (VMware)
8. Ability to configure one default alarm action to apply to some or all default alarms. (VMware)
7. Utilize Cisco Nexus 5000 switches as VSMs for Cisco Nexus 1000v deployments. (Cisco)
6. PowerPath support for QLogic QLE8100 series cards. (EMC; Yes, I know it’s coming, I’m just impatient)
5. Native Mac VMware vSphere client. (VMware; I’m really reaching with this one)
4. Removal of PowerPath License Server requirement (EMC)
3. Port Profiles added to the Cisco Nexus 5000 switches (Cisco)
2. vCenter Virtual Appliance (VMware)
And last, but probably the most important…..
1. Ability to migrate templates (VMware)
Drops mic and walks off stage….
Campus Network Design
My role in modernizing our work network started a little over a year ago when I joined the network team. Now to better understand our network, it’s best to think of it as a campus network. We have a lot of buildings, with only a few having a large amount of users, connected all together in a switched network. In total, I believe I have over 100 switches right now currently making up our network. The list of issues on our network are many:
- All users, switch management, servers, printers, etc. all on VLAN 1
- Access switches daisy chained together
- Switches handling routing that are not the core switches of the network
Now outside of everything on VLAN 1, some may ask what exactly is wrong with this layout. Well for one, it’s very difficult to locate devices on the network, since all devices are on the same subnet. Thankfully, we have a Fluke Optiview that can locate devices in a large layer 2 network. The network is also prone to flooding, from broadcasts, unicast, and multicast flooding. In fact for a couple of weeks, we experienced unicast flooding due to a misconfiguration between our core switches and the newly-installed-but-completely-out-of-my-control layer 3 switches. (Maybe one day I’ll do a post about outsourcing and it’s “benefits”).
So I’ve set out to improve the network. I am working to model the network after the Cisco Campus Network Design. The basis of this design is that access switches, routers, VPN routers, datacenters, etc. connect to distribution switches, and the distribution switches connect to two core switches. The two core switches are the control center for the network. Now, if you follow the Cisco recommendations, the Core switches only handle layer 3, and the distribution switches handle layer 2 and layer 3. Recently, there was a design discussing bringing routing all the way down the access level.
I see two issues with bringing routing all the way to the access layer in our network. First, we have a guest VLAN for our contractors that is spread through one vlan in our network, and we have a security VLAN for our security network that is spread through one vlan in our network. VRF-lite is basically VLANs for layer 3, but I believe each network in the vrf would be in it’s own subnet. However, I haven’t verified this yet. If this is the case, then that will increase management. Second, with over 100 switches, we are talking about 200 separate networks (data and voice) on each switch. That is 200 subnets to manage dhcp address for compared to the 2 we have now. This would be a huge increase in management. Granted, we could easily locate devices on the network by just performing a traceroute. Broadcast flooding or unicast flooding would be limited to only one switch. With the access switches doing routing, you could standardize on a similar configuration for all switches.
But do the increased efficiencies in the network offset the increased management of the network?
Lets forget about routing at the access layer, and go back to the design with the distribution switches being layer 2/3 devices. I would still have to look at using VRF-lite for the security and guest networks. With about 8 distribution spots in the network, I could have a minimum 16 vlans (1 voice and 1 data vlan for each distribution point) and 16 subnets, but you would have more than one access switch per vlan. A limit could be set and have say 5 access switches per vlan. This would increase the number of vlans and subnets to manage, but would reduce the number of devices in each vlan. But in this setup, there is still an issue of locating devices on the network. If one went with the minimum setup of 16 vlans and 16 subnets, private vlans could be used to isolate traffic between buildings, but even private vlans are not without their security issues, so this would need to be taken into account.
All in all, it appears I have a daunting task ahead of me. At some point, I will have to make concessions to both manageability and efficiency in the network. Though this exercise has made me think about subnets and manageability vs efficiency. Say I wanted a different subnet for each access switch. I install a 24 port switch. Is it really efficient to assign a /24 subnet to this switch just for management sake since this is what most people are used to?
I’ll update this with links once I figure out what’s going on with not being able to add links. I believe it may something to do with using Safari 4 Beta.