Intro to MongoDB (part-1)

I don’t like feeling dumb. I know this is a weird way to start a blog post. I detest feeling out of my element and inadequate. As the tech world continues to inexorably advance – exponentially even, the likelihood that I will keep running into those feelings becomes greater and greater. However, to try to combat this, I will have a number of projects to learn new products in the works. Since there is a title on this blog post and I have shortsightedly titled it the tech that I will be attempting to learn, it would be rather anticlimactic to say what is it now. Jumping in….

What is MongoDB?

The first question is what is MongoDB and what makes it different from other database programs out there? Say for example MS SQL or MySQL or PostgreSQL? To answer that question, I will need to describe a bit of the architecture. (And yes, I am learning this all as I go along. I had a fuzzy idea of what databases were and used them myself for many things. But if you asked me the difference between a relational and non-relational DB and I would have had to go to Google to answer you.) The two main types of databases out there are relational and non-relational. Trying to figure out the simple difference between them was confusing. The simplest way of defining it is using the relational model definition from Wikipedia. ” ..all data is represented in terms of tuples, grouped into relations.” The issue with this was it didn’t describe it well enough for me to understand. So, I kept looking for a simpler definition. The one I found and liked is the following, (found on stackoverflow), “I know exactly how many values (attributes) each row (tuple) in my table (relation) has and now I want to exploit that fact accordingly, thoroughly, and to its extreme.” There are other differences – such as how relational databases are more difficult to scale and work with large datasets. You also need to define all the data you will need before you create the relational DB. Unstructured data or unknowns are difficult to plan for and you may not be able to.

So, what is MongoDB then? Going back to our good friend Wikipedia, it is a cross-platform document-oriented database program. It is also defined as a NoSQL Database. There are number of different types of NoSQL though (this is where I really start feeling out of my element and dumb). There are:

  1. Document databases – These pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.
  2. Graph stores – These are used to store information about networks of data such as social connections
  3. Key-Value stores – are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or key) together with its value.
  4. Wide column stores – are optimized for queries over large datasets, and store columns of data together, instead of rows. One example of this type is Cassandra

Why NoSQL?

In our speed crazed society, we value performance. Sometimes too much. But still, performance. SQL Databases were not built to scale easily and to handle the amount of data that some orgs need. To this end, NoSQL databases were built to provide superior performance and the ability to scale easily. Things like Auto-Sharding (distribution of data between nodes), replication without third party software or add-ons, and easy scale out, all add up to high performing databases.

NoSQL databases can also be built without a predefined schema. If you need to add a different type of data to a record, you don’t have to recreate the whole DB schema, you can just add that data. Dynamic schemas make for faster development and less database downtime.

Why MongoDB?

Data Consistency Guarantees – Distributed systems occasionally have the bad rap of eventual data consistency. With MongoDB, this is tunable, down to individual queries within an app. Whether something needs to be near instantaneous or has a more casual need for consistency, MongoDB can do it. You can even configure Replica sets (more about those in a bit) so that you can read from secondary replicas instead of primary for reduced network latency.

Support for multi-document ACID transactions as of 4.0 – So I had no idea what this meant at all. I had to look it up. What it means, is that if you needed to make a change to two different tables at the same time, you were unable to before 4.0. NOW you are able to do both at the same time. Think of a shopping cart inventory. You want to remove the item out of your inventory as the customer is buying it. You would want to do those two transactions at the same time. BOOM! Multi-Document transaction support.

Flexibility – As mentioned above, MongoDB documents are polymorphic. Meaning….They can contain different data from other documents with no ill effects. There is also no need to declare anything as each file is self-describing. However……. There is such a thing as Schema Governance. If your documents MUST have certain fields in them, Data Governance will step and in and structure can be imposed to make sure that data is there.

Speed – Taking what I talked about above a bit further in, there are a number of ways and reasons why MongoDB is much faster. Since a single document is the place for reads and writes for an entity, to pull data usually only requires a single read operation. Query language is also much simpler further enhancing your speed. Going even further you can build “change streams” that allow you to trigger actions based off of configurable stimuli.

Availability – This will be a little longer since there is a bit more meat on this one. MongoDB maintains multiple copies of data using a tech called Replica Sets. They are self-healing and will auto-recover and failover as needed. The replicas can be placed in different regions as mentioned previously so that reads can be from a local source increasing speed.

In order to maintain data consistency, one of the members assumes the role of primary. All others acts as secondaries, and they will repeat the operations in the oplog of the primary. If for some reason the primary goes down, one of the secondaries is elected to primary. How does it decide you may ask? I’m glad you asked! It does it based on who has the latest data (based on a number of parameters), who has the most connectivity with the majority of other replicas, and it could use user-defined priorities. This is all happens quickly and is decided in seconds. When the election is done, all the other replicas will start replicating from the new primary. If for some reason the old primary comes back online, it will automatically discover its role has been occupied and will become secondary. Up to 50 members can be configured per replica set.

Well that’s enough for one blog post as I’m about 1200 words already. Next post will continue with sharding and more MongoDB goodness.

Windows Bare Metal Recovery on Rubrik’s Andes 5.0

Rubrik Andes 5.0 is out! There are so many features that have been added and improved upon. One of the many things that has me excited is Bare Metal Recovery. While virtualization has pretty much taken over the enterprise world, there are still reasons to have physical machines. Whether a workload is unable to be virtualized or its presence on a physical machine is a business requirement, there still exists a need for physical protection. So I wanted to do a quick walkthrough to share how Rubrik has improved its ability to perform bare metal recovery (And to go through it myself!).  I won’t go into how to create, or what SLA policies mean here, since there are plenty of good resources ( https://tinyurl.com/yb9k63ax)  Lets get started!

In order to create the PE boot disk, you will need to download the ADK from Windows. This will download the PE environment. What is needed is a boot cd that gives you a PowerShell prompt and network access. You can download the Microsoft ADK from this page: https://docs.microsoft.com/en-us/windows-hardware/get-started/adk-install or the download link here: https://go.microsoft.com/fwlink/?linkid=2026036 What you are downloading here is the setup file that will then download everything else. All the files downloaded will probably be around 7GB. I believe you should be able to use a Windows Server Install CD and go into recovery mode. I have not tested this yet, however.

Once you download the ADK and install that, you will need to download Rubrik’s PE install tool. This is a PowerShell script that will aggregate the files needed and compile them into an ISO that can then be used on a USB key etc.

Download the PE install from Rubrik’s Support site: https://www.rubrik.com/support/

(If you don’t already have one, you will need a username and password to log in)

Once downloaded, you need to unzip the files directly on your C drive.

Those two folders will be created.

Now open a Admin PowerShell prompt and run the following – You may need to adjust the beginning to account for where the .ps1 file is.

.\CreateWinPEImage.ps1 -version 10 -isopath C:\WinPEISO -utilitiespath C:\BMR

It will then begin running the ISO creation process.

At the end of this a bootable ISO will be created in the location shown in the figure (you specified in the PowerShell command above C:\WinPEISO).

You will use this CD to boot the machine to restore the physical machine later on.

Normally the next parts would already be done, since you are interested in the restore. I put this in though, since I was creating everything as I was writing this post and someone out there might not be aware of how to protect the machine in preparation for BMR.

After the machine has been installed with Windows and any other software needed, you should install the Rubrik Connector on the machine and add it to the Rubrik Cluster. To do that you need to log into the Rubrik cluster GUI and click on Windows Hosts and then, Add Windows Hosts, as shown here in the picture

You are then presented with a popup window where you can download the Rubrik Connector and install it. After it is installed, click “Add”

The next step is install the Volume Filter Driver. You can do that by clicking on the checkbox in front of the machine and then clicking on the ellipsis in the top right corner. To be clear, you don’t “need” the Volume Filter Driver for this to work, but it does a better job in keeping track of the changes and should allow you to take faster incrementals. It performs the same job as RCT in Windows or CBT in VMware. 

Click on the checkbox in front of the machine. Then click on the ellipsis, and click on “Install VFD”

The host will need to be restarted (as you can see in the picture on the right side. It is really easy to tell if the VFD is installed with the column on the right. It may take a few minutes for Rubrik to update to the fact you rebooted). Once the host is restarted, you will need to take a snapshot of the machine. You can do this by clicking on the Windows Host name itself and clicking the “Take On Demand Snapshot”.

This will launch a short wizard to take the snapshot

I’m going to click on Volumes and then the ‘C’ drive. The next screen is to add the machine to an SLA protection policy. You don’t have to, but you should. This will keep it continually protected according to the SLA you choose. Click on “Finish” and watch the magic occur.

So in case you were wondering….my first error above was because the machine was not part of a domain. Some of the permissions required, need the machine to be part of a domain.

Once the backup has been completed, you will see a dot on the calendar telling you that a snapshot exists on that day.

In order to restore this, click on that dot. You will then see the snapshots available, shown. At the end of the line you have an ellipsis you can click on to take action on it.

Now you CAN choose individual files with the “Recover Files” option but that won’t help you perform a Bare Metal Restore. The option you are looking for is “Mount”. When you do choose “Mount” you will get a new popup. This will show the drives available. When you click on the C: drive and any other one you need, click on “Next”. The next window gives you more options. You can either mount the snapshot on a host (If you are doing just a volume that needs to be restored) or, since we are doing a bare metal restore, click on No Host on the bottom to expose an SMB share with the VHDX file.

In order to preserve security around the share, you will need to enter who is allowed to access the share. You do that by entering in an IP address. ONLY the IP Addresses you input will be able to access the share.. You probably don’t know what IP address you need to put in here yet, so start your physical machine up with the PE CD and then use ipconfig in the open command line window to find the IP to use.

After the drive in Rubrik is mounted, you can find it by going to the Live Mount menu on the side and selecting Windows Volumes. When you hover over the name, it will give you the option of copying the SMB share to your clipboard. When you move your mouse down to the Path all you need to do is click on it to add to your Clipboard.

The bottom image is what the share will show in case you’re curious.

Since your machine has already been started with the PE ISO, the next step is to run the PowerShell command to begin the restoration process. The PowerShell command is shown to you in the window shown above, but here is an example of what you might use:

powershell -executionpolicy bypass -file \\192.168.2.52\xm5fuc\without_layout\RubrikBMR.ps1
				

The part in blue will be just for that mount and need to be changed. Once you hit enter, a flurry of activity will happen and you will see it copying the data over. Grab a coffee or nice single malt scotch – either will do depending on the time of day. I have a video below of what happened after I clicked enter. It restored 22 GB of files on the C:\ drive in about 15 min. This is over a 1Gb Ethernet connection and using a single virtual node. In closing I love this as a feature and feel it is a great feature with the additional refinements made on this version.

Don’t Backup. Go Forward.

3 Months in…..

Slightly over 3 months in now at my first role as Technical Marketing Engineer with Rubrik, Inc and I couldn’t be happier. The job itself is new things often enough, to where I don’t feel bored. And my team is amazing-I couldn’t ask for a more supportive group of people. The more I work with them, the better it gets. The breadth of knowledge and insight they bring to the table help me immensely. As I’m sitting here at my computer on a Friday night feeling thankful, I thought I would do a quick recap of some of the projects I’ve already been working on and things I’ve done.

First month:

Started off with bootcamp and begun learning about the product. Like past roles with most of the companies I’ve worked for, this was once again like drinking from a firehose. To be clear, I do not feel like I have even scratched the surface of the products, even after 3 months. Along with trying to ramp up on the product and forgetting countless names of co-workers (yeah, I’ve got an exceeding bad memory for names and I definitely apologize to all those whose name I’ve forgotten… ever), I also attended VMworld. While there, I caught up with countless friends and starting experiencing the high regard that people hold for Rubrik. I was able to meet customers who also loved the product, and some even went as far as sharing/paying for cab rides with me and of course dinners. I was able to help setup the Rubrik demo labs and felt like I was able to start contributing to the team. I also did my first presentation in conjunction with Pure Storage (at VMworld ever!).

Second month:

Second month, I started warming up with presentations with a product demo and a webinar. Both helped calm some of my jitters about presenting in front of people. I’ve always been nervous about presenting in front of people and imposter’s syndrome. But I also hate that feeling, and was a major reason why I wanted to be (and was) a VMware certified instructor while working at Dell Emc and a large reason for moving to this role. I’ve always been decently introverted and have worked hard to try to come out of my shell. The community has been a large part of what has made it easier for me to do so. During the month, I ended up back at HQ for team meetings and more team-building activities. This is one of the first teams that I’ve worked for that has truly worked hard at bringing their employees together and becoming family. To end the month out, I started preparing for my first VMUG presentation.

Third month:

The third month I traveled to Phoenix, Az for their UserCon there. I gave my first presentation to a…not packed house . It was actually better this way in my opinion since this allowed me to work into this sort of thing a bit easier. I felt more like it was a conversation and tried to get the attendees more involved in the presentation with me instead of it just being a slideshow. The last part of the month was finished off with going back to HQ to work in the lab. I’ve always loved lab work as it presented a clear problem or goal and you could concentrate on that instead of needing to define things first. I admit freely that I’ve confined myself in that sort of environment for too long though and need to work on my creative side. Which is why we are going to try to blog more.

So what’s next on the agenda? First thing, I have a vacation planned. First one in 5 years where I am actually going somewhere. Heading to a remote cabin in Colorado to spend a week. Get some creative juices flowing again and some rest. I will be visiting a few friends from VMware up there and enjoying a few libations with them. Hopefully the ideas for some blog posts will show up and I’ll begin writing those. After that, I’m still doing a ton of learning. Trying to get a lot better at my Spanish and Rubrik’s products along with the products we support (Azure, Hyper-V, AWS, etc.). I’m sure there will be more information coming out of those. To be continued….

VMworld 2018 post-summary

Wow, so there was a ton of activity that happened last week. VMworld 2018 US edition has now passed and was amazing. This particular one was pretty sweet for me as this marked a number of firsts for me. While I’ve been before, this is the first time I’ve played a role other than just visiting sessions and HOL’s. While that was enjoyable and a great learning experience, being able to experience the setup, breakdown and behind the scenes of what goes on for a company’s booth, was completely eye-opening. The sheer amount of work involved was completely exhausting. Not to mention the work continued after hours as well. There were parties, customer dinners, and planning sessions non-stop. I can’t even begin to say how much I enjoyed working with the Rubrik marketing team and also being able to socialize with all the great community that is always there at these events. But what actually went on? I will describe some of the activities I was able to be part of, but also some of the highlights that happened.

Saturday – I arrived mid-morning and was able to get to my hotel, through check-in, and back to the expo around 10:30-11am. This is where some of the work began for our team. I helped setup the servers and environment for the booth that would be used for demos. Other members of our team were already there and working hard before I even got there. The expo floor looks really weird at this point as there is not much put together and just lots of equipment and building blocks lying around. While the construction crew worked on the booth itself, we continued working on the demo environment until about 6ish (with the 2hr time change for me, ended up being a long day having started around 5am CST). We were well taken care of as most nights we had dinners already planned for us.

Sunday – We continued working on finishing the demo environment and worked on setting up the demo stations. The construction on the booth was nearing completion and things were really starting to take shape. As a side note, the team that worked on our booth did really considering I think our booth was one of the best-looking and ambitious ones there – no bias of course . Everything was ready to go when the expo floor opened up at 5pm for the Welcome Reception. The welcome reception went well and I was able to mill around a bit finding friends I haven’t seen for a while. After dinner I pretty much passed out.

Monday – This was another great day, lots of check in’s through the day back at the booth and seeing great friends and getting ready for that night. I had my first ever booth presentation at the Pure booth as well. Been a while since I’ve spoke in front of strangers in this capacity so it was a bit unnerving. In full disclosure, even when I was an Instructor at Dell, I still was a bundle of nerves. Always been a bit of an introvert but constantly working on trying to change that. What made it even more exciting was that I was allowed to raffle a couple of VIP passes to bypass the line getting into our party later that night. The presentation went well and was able to present Rubrik’s tech and how we integrate with Pure to about 50 attendees.

Moving on from there we had the big party that night. Run DMC and The Roots were the main attraction. Even the DJ music leading up to it was good. Everyone had a lot of fun and we ended up with about 1500+ attendees for the party. There were large lines waiting to get in so the employee bands came in handy.

Tuesday – Recovering from the night before was a little difficult but was able to get up and checked on demo machines to make sure everything was running smooth for the demos. Then I went to see more people I haven’t seen in forever. Evening was taken up with team meetings and other fun stuff.

Wednesday – Brought an end to the solutions expo. That meant we could start packing everything up. Which we did. We ended up needing to run over some to the next day, but we were able to get the majority of equipment turned off and organized for packing. Later that night I went to what started as a LAN party but ended up as a Cards Against Humanity. There may have been a few incidents that involved security being called .

Thursday – We finished up and then I was able to grab a flight out at 1.50pm and made it home around 9pm-ish. Ended up inside for the weekend as I caught some sort of flu or cold bug (yay planes and conferences) and still trying to get over it as I’m writing this. Some of the things I enjoyed as far as announcements:

Announcements:

20TH Anniversary for VMware!

Tattoos on Pat G./Sanjay P./Yanbing Li. – Though the permanence of some of them is questioned

vSphere 6.7 Update 1 – This is bringing a bunch of updates most notable Full Featured HTML5 client and vMotion and snapshot capabilities for vGPUs.

vSphere Platinum Edition – This new licensing includes AppDefense

New versions of vRealize Operations (7.0) and Automation (7.5)

Amazon RDS on vSphere – Relational DBs on VMware AWS. This will allow companies to run RDS and not have to worry about the management of it. Management can be done through a single, simple interface. You can also use it to create a hybrid setup between on-site and cloud enabling all sorts of use cases. SQL, Oracle, PostgreSQL, MySQL, and MariaDB will all be supported.

Amazon AWS expansion to Asia Pacific Region and Sydney – This marks that VMware’s presence extends to all major geographies.

Lower price of Entry for VMC on AWS – 3 Host min, license optimization for MS/Oracle apps. There is also a single host SDDC to test with and play around with. (This was intro’d a bit before VMworld.) You can specify host affinity for VMs and number of cores that an application requires.

VSAN on EBS – Scale from 15-35TB per host in increments of 5TB.

Accelerated live migration – VMware HCX now allows you to migrate just about any VM from on-premises to VMC

Project Dimension – Combines VMware Cloud Foundation (in HCI) with a Cloud Control Plane. So far this is looking like something like Azure Dev Stack, where VMware will take care of the hardware and software patching for the SDDC and the customer worries about apps at the customer site.

ESXi on 64-Bit ARM – details are still light.

These are not every single one of the announcements but the ones I most relate to.

My info was sourced from the following places and …. Being there.

https://www.vmware.com/radius/vmworld-2018-innovation/

https://www.cio.co.nz/article/645860/amazon-relational-database-service-on-vmware-launched-at-vmworld/

https://www.forbes.com/sites/patrickmoorhead/2018/09/04/aws-dell-arm-and-edge-announcements-dominate-vmworld-2018/#31ffd25536c4

New Beginnings

It was with a bit of regret and a small bit of fear that I turned in my 2 weeks’ notice last week. Even though I technically left Dell 2.5 yrs. ago, Dell wasn’t done with me yet and decided to buy the company I moved to. So essentially, I worked for Dell in some capacity for the last 6 yrs. During that time, I did a bit of everything from front-line phone tech to VMware Certified Instructor. I learned a ton of IT that you never really see until you work in larger environments and made some great life-long friends. I really enjoyed teaching and the feeling I may have helped my students along in their career, and because of that, I decided to get more into the education side of IT. To do this, I moved over to EMC to be a Content Developer for the Enterprise Hybrid Cloud solution (1 month after I joined EMC, Dell announced the buyout and I once again became a Dell employee). I helped develop classes for that for a while before going down the path of Lab Architect.

Shortly after I started the Lab Architect role, I was approached with a possibility of blending all the things I love in a single position and with the sweet addition of getting paid for it as well. The training, talking with customers, building POCs, and blogging. I love the idea of trying to help people with the work that I do, and as I get older (ugh) I personally feel that I need to make more a difference with trying to help people. I believe this position will allow me to do that. I greatly appreciate all the help that everyone has given me up to now and continuing. The VMware community is one of the best communities I’ve ever been a part of and God-willing will continue to be part of for a long while.

Putting on a different paragraph for the TL:DR crowd, I have accepted a new role as Technical Marketing Engineer for Rubrik. My last day with Dell/EMC is 7/27. I am looking forward to working with a team of people who I greatly admire and respect. I have a ton of catchup and work to do in the coming months and pray they have the patience for me . I am extremely excited not only about the people that I get to work with but also the product as well. Rubrik has some really cool technology which I plan on delving way deeper into and seems like they really have an awesome vision on how to handle data to make it really easy to manage and control. I look forward to what’s coming….

VCIX-NV Objective 2.2 – Configure and Manage Layer 2 Bridging

Underneath todays objective we have the following topics:

  • Add Layer 2 Bridging
  • Connect Layer 2 Bridging to the appropriate distributed virtual port group

So first let’s do a little background on what we are doing and more importantly the why. Layer 2 bridging in this case, is an ability we have in NSX to take a workload that currently only exists in our NSX world and “bridge” that with the outside world. We could use this to reach a physical server being used as a proxy or gateway for example. We create this bridge between a logical switch inside NSX and it routes out to a VLAN. I am going to borrow the picture from the NSX guide to try to simplify it a bit more. (credit to VMware for the pic)

In the above picture you have the NSX VXLAN 5001 that is running our workload inside ESXi. We have a need to communicate to a physical workload labeled as such. In order to do that, we have an NSX Edge logical router that has L2 bridging enabled on it. The Bridging tab itself will allow us to choose a logical switch and distributed port group that will be connected together. To do this here are the following steps:

  1. If you don’t already have one, you will need to deploy a new Logical Router. To do that, you will need to go to the NSX Edges subsection of the Networking and Security screen of NSX.
  2. Click on the green + icon on the middle pane
  3. The first information you will need to fill out will be Install Type, and Name. The rest of the options we won’t go over in this walkthrough.
  4. We will need to select Logical Router as the Install Type and then type in a name.
  5. On the next screen, we will need to input a password for the Edge device.
  6. On the Configure Deployment Screen, we will need to add an actual appliance here by clicking on the green + icon.
  7. This popups with a screen for us to select where we wish to place the device’s compute and storage resources.
  8. On the Configure Interfaces screen, I’ve chosen to connect it to my management network. You don’t really need to configure an interface as the actual bridging work will be done by a different mechanism.
  9. You can click past the Default Gateway screen.
  10. Click Finish on the Ready to Complete screen and away you go.

Now the actual bridging mechanism is found by going into the Edge itself

  1. Double click on the Edge device you are going to use for bridging.
  2. Click on Manage, then on Bridging tabs in the center pane.
  3. To add a bridge, click on the green + arrow
  4. Give the Bridge a name, select the Logical Switch you are bridging, and the VLAN Port Group you will be bridging to. (Just as a side note, none of the normal dv Port Groups will show up unless you have a VLAN assigned to them. Something I discovered while writing this )
  5. Once you click Ok, you will exit out to the Bridging screen again, and you will now need to publish your changes to make it work.
  6. Once published, you will have a Bridging ID
  7. You can have more than one Bridge using the same Edge device, but they won’t be able to bridge the same networks. In otherwords you can’t use a bridge to connect the same VXLAN to two different VLAN Port Groups.

And that covers this objective. Stay tuned for the next objective!

Mike

Pre-Filled Credentials for vSphere 6.5+ Web/HTML5 client

So I can’t take really any credit for this blog post as the original work was all done by William Lam. I have my own homelab and also maintain a few labs at work that are hidden off in their own networks. This little trick comes in real handy. Mainly because I have quite a few environments to log into and it makes it simple when I don’t need to remember which domain they are under. The location of the file has changed under 6.5 and 6.7 so I just figured I would update his original post with the location in the new versions.

The file in question is unpentry.jsp that needs to be modified. In version 6.0 the file is located at  /usr/lib/vmware-sso/vmware-sts/webapps/websso/WEB-INF/views/unpentry.jsp. The new file is located at /usr/lib/vmware-sso/vmware-sts/webapps/ROOT/WEB-INF/views/unpentry.jsp.

When you use vi to open the file on the VCSA (assuming that’s what pretty much everyone is using these days) the area to be modified is the same. The lines should look like the following:

Obviously, the actual login info will match your environment. Once those are modified and saved, you will see the wonderful screen when pulling up your environment:

You may need to click on the fields for the Login button to light up, but hey….no more typing username and passwords in!

Thanks again to William for the info. Now if we could just get a skin creator/ theme engine for the HTML5 client………

VCIX-NV Objective 2.1 – Create and Manage Logical Switches

Recovering from dual hernia surgery and changing job roles…….it’s me and I’m back. Moving back into the Blueprint, we are working on Objective 2.1 – Create and Manage Logical Switches. We will be covering the following points in this blog post.

  • Create and Delete Logical Switches
  • Assign and configure IP addresses
  • Connect a Logical Switch to an NSX edge
  • Deploy services on a Logical Switch
  • Connect/Disconnect virtual machines to/from a Logical Switch
  • Test Logical Switch connectivity

First it would probably be appropriate to make sure that we know what a logical switch can do. Just like its physical counterpart, an NSX switch can create a logical broadcast domain and segment. This keeps broadcasts from one switch from spilling over to another and saving network bandwidth. Feasibly you can argue that the network bandwidth is a bit more precious than real network bandwidth because it requires not only real network bandwidth but also requires processing on the side of the hosts (whereas normal network bandwidth would be processed by the ASIC on the physical network switch).

A logical switch is mapped to a unique VXLAN which then encapsulates the traffic and carries it over the physical network medium. The NSX controllers are the main center where all the logical switches are managed.

In order to add a logical switch, you must obviously have all the needed components setup and installed (NSX manager, controllers, etc) I am guessing you have already done that.

  1. In the vSphere Web Client, navigate to Home > Networking & Security > Logical Switches.
  2. If your environment has more than one NSX Manager, you will need to select the one you wish to create the switch on, and if you are creating a Universal Logical Switch, you will need to select the primary NSX Manager.
  3. Click on the green ‘+’ symbol.
  4. Give it a name and optional description
  5. Select the transport zone where you wish this logical switch to reside. If you select a Universal Transport Zone, it will create a Universal Logical Switch.
  6. You can click Enable IP Discovery if you wish to enable ARP suppression. This setting is enabled by default. This setting will minimize ARP flooding on this segment.
  7. You can click Enable MAC learning if you have VMs that have multiple MAC addresses or Virtual NICs that are trunking VLANs.

The next point, assign and configure IP addresses, is a bit confusing. There is no IP address you can “assign” to just the logical switch. There is no interface on the switch itself. What I am guessing they meant to say here was that you should be familiar with adding an Edge Gateway interface to a switch, and adding a VM to the switch. Both of these would in a roundabout way assign and configure a subnet or IP address to a logical switch. That’s the only thing I can think of anyways.

The next bullet point is, connecting a logical switch to an NSX Edge. This is done quickly and easily.

  1. While you are in the Logical Switches section (Home > Networking & Security > Logical Switches), you would then click on the switch you want to add the Edge device to.
  2. Next, click the Connect an Edge icon.
  3. Select the Edge device that you wish to connect to the switch.
  4. Select the interface that you want to use.
  5. Type a name for the interface
  6. Select whether the link will be internal or uplink
  7. Select the connectivity status. (Connected or not)
  8. If the NSX Edge you are connecting has Manual HA Configuration selected, you will need to input both management IP addresses in CIDR format.
  9. Optionally, edit the MTU
  10. Click Next and then Finish

The next bullet point covers deploying services on a logical switch. This is accomplished easily by:

  1. Click on Networking & Security and then click on Logical Switches.
  2. Select the logical switch you wish to deploy services on.
  3. Click on the Add Service Profile Icon.
  4. Select the service and service profile that you wish to apply.

There is an important caveat here, the icon will not show up unless you have already installed the third party virtual appliance in your environment. Otherwise your installation will look like mine and not have that icon.

The next bullet point, Connecting and Disconnecting VMs from a Logical Switch is also simply done.

  1. While in the Logical Switch section (kind of a theme here huh?), right click on the switch you wish to add the VM to.
  2. You have the option to Add or Remove VMs from that switch – as shown here in the pic

The final point, testing connectivity, can be done numerous ways. The simplest way would just be to test a ping from one VM to another. This could be done on pretty much any VM with an OS on it. You can even test connectivity between switches (provided there is some sort of routing setup between them. If you only had one VM on that segment (switch) but you had a Edge on it as well, you could pin the Edge interface from the VM as well. There are many ways to test connectivity. And with that, this post draws to a close. I will be back soon with the next Objective Point 2.2 Configure and Manage Layer 2 Bridging.

Tales of a Small Business Server restore……

I know that many of you have gone through your own harrowing tales of trying to bring environments back online. I always enjoy hearing experiences of these. Why? Because these are where learning takes place. Problems are found and solutions have to be found. While my tale doesn’t involve a tremendous amount of learning per se, I feel there are a few things I did discover along the way that may be useful for someone that has to deal with this later. So let’s being the timeline.

Backstory

The current server is a Microsoft Small Business Server 2011. This server serves primarily as a DNS/File/Exchange server. It houses about 3-400GB of Exchange data, and about 700GB of user data. Now this machine is normally backed up using a backup product called Replibit. This product uses an onsite appliance to house the data and stage for replication to the cloud. So theoretically you will have a local backup snapshot and a remote-site backup. As backups always somehow have challenges associated with them, this seems like an appropriate amount of caution. The server itself is a Dell and is more than robust enough to handle the small business’ needs. There are other issues I would be remiss to not mention. Like the majority of the network is on a 10/100 switch with the single gigabit uplink being used by the SBS server.

Sometime in the wee hours of the morning on
Wednesday….

This was when the server was laid low. Don’t know what exactly caused it, as I haven’t performed a root cause analysis yet, and it’s unlikely to happen now. For the future I will be recommending a new course direction for the customer, as I believe there are better options out there now (Office365, standard Windows Server).

I believe that there was some sort of patch that may or may not have happened about the time the machine went down. Regardless, the server went down and did not come back up. It would not even boot in Safe Mode. It would just continually reboot as soon as Windows began to load. Alerts went off notifying of the outage and the immediate action taken was to promote the latest snapshot to a VM on the backup appliance. This is one of the nice features that Replibit allows. The appliance itself runs on customized Lubuntu distro and virtualization duties are handled by KVM. The VM was started with no difficulty, and with a few tweaks to Exchange, (for some reason it didn’t maintain DNS forwarding options) everything was up and running.

After 20 min of unsuccessfully trying to get the Dell server to start in safe mode or Last Known Config, or any mode I could I decided my energies would be better spent just working on the restore. Due to the users working fine and happy on the vm, the decision was made to push the restore to Saturday to minimize downtime and disruption.

Saturday 8:00am…….

As much as I hate to get up early on a Saturday and do anything besides drink coffee, I got up and drove to the companies’ office. An announcement was made the day before that everyone should be out of email and network etc. Then we proceeded to shut down the VM. Using the recovery USB, I booted into the recovery console and attempted to start a restore of the snapshot that the VM was using to run. I was promptly told, “No” by the recovery window. Reason? The ISCSI target could not be created. This being the first time I had used Replibit personally, I discovered how it works is, the appliance creates an ISCSI target out of the snapshot, then uses that to stream the data back to the server being recovered. Apparently when we promoted the snapshot to a Live VM, it created a delta disk with the changes from Wednesday to Saturday morning. The VM had helpfully found some bad blocks on the 6mo old 2TB Micron SSD in the backup appliance, which corrupted the snapshot delta disk. This was not what I wanted to see.

With the help of Replibit support, we attempted everything we could to start the ISCSI target. We had no luck. We then tried creating an ISCSI target from the previous snapshot. This worked. This was a problem however, because we would lose 3.5 days of email and work. Through some black magic and a couple of small animal sacrifices, we mounted the D drive of the corrupted snapshot with the rest of the week’s data (somehow it was able to differentiate the drives inside the snapshot). I was afraid though, that timestamps would end up screwing us with the DB’s on the servers. Due to the lack of any other options though, we decided to press forward. The revised plan now, was to restore the C drive backup from Tuesday night and then try to copy the data from the D drive from the snapshot using WinSCP. We started the restore – it was about 11am-ish on Saturday. We were restoring 128GB of data only, so we didn’t believe that it would take that long. The restore was cooking at first, 2-350MB/min. But as the time wore on….the timer kept adding hours to the estimate and the transfer rate kept dropping. Let’s fast forward.

Sunday 9:20pm

Yes…. 30+hrs later for 130GB of data, and we were done with just the C drive. At this point, we were sweating bullets. The company was hoping to open as usual Monday morning and with those sort of restore times, it wasn’t going to happen. —Would like to send a special shout out to remote access card manufacturers. Dell’s iDRAC in this case. Without which, I would have been forced to stay onsite during this time and that wouldn’t have been fun—Back to the fun. First thing now was to see if the restore worked and the server would come up. I was going to bring it up in safe mode with networking as the main Exchange DB was on the D drive and I didn’t want the Exchange server to try to come up without that. Or any other services that also required files on the D drive for that matter.

The server started and F8 was pressed. “Safe Mode with Networking” was selected and fingers were crossed. The startup files scrolled all the way down through Classpnp.sys and it paused. The hard drives lit up and pulsed like a Christmas tree. 5 min later the screen flashed and “Configuring Memory” showed back up on the screen. “Fudge!” – this is what happened before the restore, just slower. Rebooted, came back to the item selection screen and this time just chose “Safe Mode”. For whatever reason, the gods were smiling on us and the machine came up. First window up by my hand was a command prompt with a SFC /scannow command run. That finished with no corrupt files found (of course) so I moved on. I then created the D drive as it had overwritten the partition table when the C drive was restored. I had no access to the network of course and needed that to continue with the restoration process. Rebooted again and chose “..with Networking” again. This time it came up.

Now we moved on to the file copy. The D drive was mounted on the backup appliance in the /tmp folder (just mounted mind you, not moved there) on the Linux backup appliance. We connected with WinSCP and chose a couple folders and started the copy. Those folders moved fine, so on to some larger ones……Annnnnd an error message. Ok what was the error? File name was too long. Between the path name and the file name, we had files that exceeded 255 chars. This was on basically a 2008r2 Windows server so there was no real help for files that exceeded that. While NTFS file system itself can accept a filename including path of over 32k characters, the Windows shell API can’t. Well crap. This was not going the way I wanted it to. Begin thought process here. Hmmm Windows says it has a hotpatch that can allow me to work around this… This doesn’t help me with the files that it pseudo-moved already though. I can’t move/delete/rename or do any useful thing to those files, whether in the shell or in Explorer. ( I do discover later that I can delete files locally with filenames past 255 char if I use WinSCP to do so. This does create a lock on the folder though so you will need to reboot before you can delete everything) I can’t run the hotfix in safe mode but I don’t really want to start Windows up in normal mode. I don’t have much choice at this point, so I move the rest of the Exchange DB files over to the D drive. This will allow me to start in regular mode without worrying about Exchange. I now go home to let the server finish the copy of about 350ish GB. A text is sent out that the server is not done and informing the company of the status of our work.

Monday morning 8am

The server is rebooted and it comes up in regular mode – BIG SIGH OF RELIEF – the hotpatch files are retrieved and I try to run them. Every one, even though 2008r2 is specifically called out, informs me that they will not work on my operating system. Well this is turning back into a curse-inducing moment.. again. Through a friend, I learn of a possible registry entry that might let us work with long file names – this doesn’t work either. Through my frantic culling through websites in my search for a solution, I find out there are two programs that do not use the Windows API and so are not hampered by that pesky MAX_PATH variable. (I did find there is a SUBST command I could use at the CLI to try to change the name manually. This is not feasible though as one user has over 50k files that would need to be renamed.) Those programs are RoboCopy and Fast Copy. Fast Copy looks a little dated, I know, but as I found out, it worked really well. On to the next hurdle! These tools require a Windows SMB share to work, so we need to mount a Samba share on the backup appliance and reference the mounted snapshot so we can get to it. This works and a copy is setup to test. 5 minutes in…. 10 minutes in… Seems like it’s working. Fast Copy is averaging a little better than 1GB/min transfer speeds as well. Set it up for multiple folders and decide to leave it in peace and go to bed (it is 12am at this point).

Tuesday morning

All files are moved over at this time. Some of them didn’t pull NTFS permissions with them for some odd reason, but no big deal, I’ll just re-create them manually. Exchange needs to be started. Eseutil to the rescue! The DB were shut down in a dirty state. The logs are also located on the C drive. We are able to find the missing logs though and merge everything back together and are able to get the DBs mounted. At this point, there is just a few “mop-up” things to do. There was one user that lost about 4 days of email since she was on a lone DB by herself and it was hosted on the C drive. She wasn’t happy, but not much we could do with a hardware corruption issue unfortunately.

Lessons learned from this are as follows (This list is not all inclusive). You should test the backup solution you are using before you need it. Some things are unfortunately beyond your control though. Corruption on the hardware on the backup device is one of those things which just seems like bad luck. You should always have a Restore Plan B, C, …. however. To go along with this, realistic RPOs and RTOs should be shared with the customer to keep everyone calm. Invest in good whiskey. And MAX_PATH variables suck but can be gotten around with the programs (whose links I included) above. Happy IT’ing to everyone!

VCIX-NV Objective 1.3 Configure and Manage Transport Zones

Covering Objective 1.3 now we will be covering the following topics

  • Create Transport Zones according to a deployment plan
  • Configure the control plane mode for a Transport Zone
  • Add clusters to Transport Zones
  • Remove clusters from Transport Zones

So, beginning with the first point, Create Transport Zones according to a deployment plan. What is a transport zone? Well simply, a transport zone is a virtual fence around the clusters that can talk to each other over NSX. If you want a cluster to be able to talk to other clusters that are on NSX, they must be included in the same transport zone. It is important to note that all VMs included in a cluster that is part of a transport zone will have access to that transport zone. Another thing to be careful of is that while a transport zone can span multiple VDSs, you should be sure that all the clusters that are on that VDS are included in the transport zone. You may run into situations where machines won’t be able to talk to each other otherwise if you have improper alignment.

Shown in the above example, you can see that even though you have the DVS Compute_DVS that spans across 2 clusters, since you add to a transport zone by cluster, it is possible to have just half of the clusters that make up that DVS on the transport zone. This leaves the hosts in Cluster A unable to talk to anyone on the NSX networks.

On to the next point. Configure the control plane mode for a Transport Zone. You can choose between three different control plane modes available.

  • Multicast
  • Unicast
  • Hybrid

These modes control how BUM (Broadcast, Unicast, Multicast) traffic is distributed and more.

Multicast replication mode depends on the underlaying architecture being a full Multicast implementation. The VTEPs on each host join a Multicast group so when BUM traffic is sent, they will receive it. The advantage of this is BUM traffic is only distributed to hosts that participate, possibly cutting the traffic down. Downsides of this are, IGMP, PIM, and Layer 3 Multicast routing are required at the hardware layer adding complexity to the original design.

Unicast replication mode, is everything multicast is not. More specifically, when a BUM packet is sent out, it is sent to every other host on the VXLAN segment. It will then pick a host on the other VXLAN segments and designate it a Unicast Tunnel End Point or UTEP and it will forward the frame to that and then the UTEP will forward it to all other hosts on its VXLAN segment. The advantages of this are not caring about the underlying hardware at all. This is a great thing from the decoupling from hardware standpoint, on the other hand the downside to it is, it uses a lot more bandwidth.

Hybrid replication mode is exactly that. Hybrid. It is a good mix between the above. Instead of needing all the things in multicast, only IGMP is used. Unicast is used between the VXLAN segments to avoid the need for PIM and Layer 3 routing, but internally on the VXLAN segment, IGMP is used and it cuts down on the bandwidth quite a bit. With Hybrid mode, instead of a UTEP being used between segments, it is now called a MTEP or Multicast Tunnel Endpoint.

Unicast is what is used most commonly on smaller networks and Hybrid in larger networks.

As far as adding and removing clusters from Transport Zones, you can do that a different times (adding). You can add when you initially create the transport zone, or you can do it afterwards. If you do it afterwards you will need to be in the Installation sub menu on the navigation menu on the left side of the screen. You then will need to click on the Transport Zones tab and then click on the transport zone you wish to expand. Then click on the Add Cluster icon, which looks like three little computers with a + symbol on the left side. Then select the clusters you wish to add. To remove a cluster, you need to be in the same place, but click on the Remove Clusters icon instead.

That’s the end of section 1. Next up. Section 2. Create and Manage VMware NSX Virtual Networks.