Monitoring the SAN shine with Virtual Instruments

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

About this blog

The SANMAN

In a World where Data Remains Exponenetial....

It was about three months ago that one of my friends had informed me he was leaving HDS to join a company named Virtual Instruments. ‘Virtual Instruments?’ I asked myself, trying to fathom if I’d have heard of them before only to realize that I had once seen a write up on their SAN monitoring solution, which was then termed NetWisdom. I was then inadvertently asked to mention Virtual Instruments in one of my blogs - nice try pal but I had made it clear several times before to vendors requesting the same that I didn’t want my blog to become an advertising platform. Despite this though I was still intrigued by what could have persuaded someone to leave a genuinely stable position at HDS to a company I hadn’t really had much exposure to myself. Fast forward a few months, several whitepapers and numerous discussions and I find myself writing a blog about the very said company.

Simple fact is it’s rare to find a solution or product in the storage and virtualization market that can truly be regarded as unique. More often than not most new developments fall victim to what I term the ‘six month catch up’ syndrome in which a vendor brings out a new feature only for its main competitor to initially bash it and then subsequently release a rebranded and supposedly better version six months later. The original proponents of thin provisioning, automated tiered storage, deduplication, SSD flash drives etc. can all pay testament to this. It is hence why I have taken great interest in a company that currently occupies a niche in the SAN monitoring market and as yet doesn’t seem to have worthy competitor, namely Virtual Instruments.

My own experience of Storage monitoring has always been a pain in the sense that nine times out of ten it was a defensive exercise in proving to the applications, database or server guys that the problem didn’t lie with the storage. Storage most of the time is fairly straightforward, wherein if there are any performance problems with the storage system they’ve usually stemmed from any immediate change that may have occurred. For example provision a write intensive LUN to an already busy RAID group and you only have to count the seconds before your IT director rings your phone on the verge of a heart attack at how significantly his reporting times have increased. But then there was always the other situation when a problem would occur with no apparent changes having been made. Such situations required the old hat method of troubleshooting supposed storage problems by pinpointing whether the problem was between the Storage and the SAN fabric or between the Server and the SAN but therein dwelled the Bermuda Triangle at the centre of it all i.e. the SAN. Try to get a deeper look into the central meeting point of your Storage Infrastructure and to see what real time changes have occurred on your SAN fabric and you’d subsequently enter a labyrinth of guesses and predictions.

Such a situation occurred to me when I was asked to analyze and fix an ever-slowing backup of an Oracle database. Having bought more LTO4 tapes, incorporating a destaging device, spending exorbitant amounts of money on man days for the vendor’s technical consultants, playing around with the switches buffer credits and even considering buying more FC disks, the client still hadn’t resolved the situation. Now enter yours truly into the labyrinth of guesses and predictions. Thankfully I was able to solve the issue by staying up all night and running a Solaris IOSTAT, while simultaneously having the storage system up on another screen. Eventually I was able to pinpoint (albeit with trial and error tactics) the problem to rather large block sizes and particular LUNs that were using the same BEDs and causing havoc on their respected RAID groups. With several more sleepless nights to verify the conclusion, the problem was finally resolved. Looking back surely there was a better, cost effective and more productive way to have solved this issue. There was but I just wasn’t aware of it.

Furthermore ask any Storage guy that’s familiar with SAN management/monitoring software such as HDS’ Tuning Manager, EMC’s ControlCenter, HP’s Storage Essentials and their like and they’ll know full well that despite all the SNIA SMI-S compliancy they still fail to provide metrics beyond the customary RAID group utilization, historic IOPS/sec, cache hit rate, disk response times etc. in other words from the perspective of the end-user there really is little to monitor and hence troubleshoot. Frustratingly such solutions still fail to provide performance metrics from an application to storage system view and thus also fail to allow the end user to verify if they are indeed meeting the SLAs for that application. Put this scenario in the ever growing virtual server environment and you are further blinded by not knowing the relation between the I/Os and the virtual machines from which they originated.

Moreover Storage vendors don’t seem to be in a rush to solve this problem either and the pessimist in me says this is understandable when such a solution would inevitably lead to a non-procurement of unnecessary hardware. With a precise analysis and pinpointing of performance problems/degradation and you have the consequent annulment of the haphazard ‘let’s throw some more storage at it’, ‘let’s buy SSDs’ or ‘let’s upgrade our Storage System’ solutions that are currently music to the ears of storage vendor sales guys. So amidst these partial viewing vendor provided monitoring tools, which lack that essential I/O transaction-level visibility, Virtual Instruments (VI) pushes forth it’s solution, which boldly claims to encompass the most comprehensive monitoring and management of end-to-end SAN traffic. From the intricacies of a virtual machine’s application to the Fibre Channel cable that’s plugged into your USPV, VMax etc. VI say they have an insight. So looking back had I had VI’s ability to instantly access trending data on metrics such as MB/sec, CRC errors, log ins and outs etc. I could have instantly pinpointed and resolved many of the labyrinth quests I had ventured through so many times in the past.

Looking even closer at VI, there are situations beyond the SAN troubleshooting syndrome in which it can benefit an organization. Like most datacenters if you have one of the Empire State Building-esque monolithic storage systems it is more than likely being under utilized with the majority of its residing applications not requiring the cost and performance of such a system. So while most organizations are aware of this and look to saving costs by tiering their infrastructure onto cheaper storage via the alignment of their data values to the underlying storage platform, it’s seldom a seen reality due to the headaches and lack of insight related to such operations. Tiering off an application onto a cheaper storage platform requires the justification from the Storage Manager that there will be no performance impact to the end users but due to the lack of precise monitoring information, many are not prepared to take that risk. In an indirect acknowledgement to this problem, several storage vendors have looked at introducing automated tiering software for their arrays which in essence merely looks at the LUN utilization before migrating them to either higher-performance drives or cheaper SATA drives. In reality this is still a rather crude way of tiering an infrastructure when you consider it ignores SAN fabric congestion or improper HBA queue depths. In such a situation a monitoring tool that tracks I/Os across the SAN infrastructure without being pigeonholed to a specific device is axiomatic in the enablement of performance optimization and the consequent delivery of Tier I SLAs with cheaper storage – cue VI and their VirtualWisdom 2.0 solution.

In the same way that server virtualisation exposed the under utilization of physical server CPU and Memory, the VirtualWisdom solution is doing the same for the SAN. While vendors are more than pleased to further sell more upgraded modules packed with ports for their enterprise directors, it is becoming increasingly apparent that most SAN fabrics are significantly over-provisioned with utilization rates often being less than 10%. While many SAN fabric architects seem to overlook fan in ratios and oversubscription rates in a rush to finish deployments within specified project deadlines, underutilized SAN ports are now an ever-increasing reality that in turn bring with them the additional costs of switch and storage ports, SFPs and cables.

Within the context of server virtualisation itself, which has undoubtedly brought many advantages with it, one irritating side affect has been the rapid expansion of FC traffic to accommodate the increased number of servers going through a single SAN switch port and the complexity now required to monitor it. Then there’s the virtual maze which starts with applications within the Virtual Machines that are in turn running on multi-socket and multi-core servers, which are then connected to a VSAN infrastructure only to finally end up on storage systems which also incorporate virtualization layers whether that be with externally attached storage systems or thinly-provisioned disks. Finding an end-to-end monitoring solution in such a cascade of complexities seems an almost impossibility. Not so it seems for the team at Virtual Instruments.
Advancing upon the original NetWisdom premise, VI’s updated Virtual Wisdom 2.0 has a virtual software probe named ProbeV. The ProbeV collects the necessary information from the SAN switches via SNMP and on a port to port basis metrics on information such as the number of frames and bytes are collated alongside potential faults such as CRC errors synchronization loss, packet discards or link resets /failures. Then via the installation of splitters (which VI name TAPs - Traffic Access Points) between the storage array ports and the rest of the SAN, a percentage of the light from the fibre cable is then copied to a data recorder for playback and analysis. VI’s Fibre-Channel probes (ProbeFCXs) then analyze every frame header, measuring every SCSI I/O transaction from beginning to end. This enables a view of traffic performance whether related to the LUN, HBA, read/write level, or application level, allowing the user to instantly detect application performance slowdowns or transmission errors. The concept seems straightforward enough but it’s a concept no one else has yet been able to put in practice, despite growing competition from products such as Akorri's BalancePoint, Aptare's StorageConsole or Emulex's OneCommand Vision.

Added to this VI’s capabilities can also provide a clear advantage in preparing for a potential virtualization deployment or dare I fall for the marketing terminology – a move to the private cloud. Lack of insight of performance metrics has evidently led to the stagnation of the majority of organizations virtualising their tier 1 applications. Server virtualization has reaped many benefits for many organizations, but ask those same organizations how many of them have migrated their IO intensive tier 1 applications from their SPAARC based physical platforms to an Intel based virtual one and you’re probably looking at a paltry figure. The simple reason is risk and fear of performance degradation, despite logic showing that a virtual platform with resources set up as a pool could potentially bring numerous advantages. Put this now in the context of a world where cloud computing is the new buzz as more and more organizations look to outsource many of their services and applications and you then have even fewer numbers willing to launch their mission critical applications from the supposed safety and assured performance of the in-house datacenter to the unknown territory of the clouds. It is here where VirtualWisdom 2.0 has the potential to be absolutely huge in the market and at the forefront of the inevitable shift of tier 1 applications to the cloud. While I admittedly I find it hard to currently envision a future where a bank launches it’s OLTP into the cloud based on security issues alone, I’d be blinkered to not realize that there is a future where some mission-critical applications will indeed take that route. With VirtualWisdom’s ability to pinpoint virtualized application performance bottlenecks in the SAN, it’s a given that the consequences will lead to an instantly significant higher virtual infrastructure utilization and subsequent ROI.

The VI strategy is simple in that by recognizing I/O as the largest cause of application latency, VirtualWisdom’s inclusion of baseline comparisons of I/O performance, bandwidth utilization and average I/O completions comfortably provide the necessary insight fundamental to any major virtualization or cloud considerations an organization may be planning for. With its ProbeVM, a virtual software probe that collects status from VMware servers via vCenter, the data flow from virtual machine through to the storage system can be comprehensively analyzed with historical and real-time performance dashboards leading to an enhanced as well as accurate understanding of resource utilization and performance requirements. With a predictive analysis feature based on real production data the tool also provides the user the ability to accurately understand the effects of any potential SAN configuration or deployment changes. With every transaction from Virtual Machine to LUN being monitored, latency sources can quickly be identified whether it’s from the SAN or the application itself, enabling a virtual environment to be easily diagnosed and remedied should any performance issues occur. With such metrics at their disposal and the resultant confidence given to the administrator, the worry of meeting SLAs could quickly become a thing of the past while also rapidly hastening the shift towards tier 1 applications being on virtualized platforms. So despite growing attention being given to other VM monitoring tools such as Xangati or Hyperic, they’re solutions still lack the comprehensive nature of VI.

The advantages to blue-chip, big corporate customers are obvious and as their SAN and virtual environments continue to grow, an investment into a VirtualWisdom solution should soon become compulsory for any end of year budget approval. In saying that though, the future of VI also quite clearly lies beyond the big corporates with benefits which include the enablement of an organization to have real- time proactive monitoring and alerting, consolidation, preemptive analysis of any changes within the SAN or Virtual environment and comprehensive trend analysis of application, host HBA, switches, virtualization appliances, storage ports and LUN performance. Any company therefore looking to either consolidate their costly over-provisioned SAN, accelerate troubleshooting, improve their VMware server utilization & capacity planning, implement a tiering infrastructure or migrate to a cloud would find the CAPEX improvements that come with VirtualWisdom a figure too hard to ignore. So while Storage vendors don’t seem to be in any rush to fill this gap, they too have an opportunity to undercut their competitors by working alongside VI by promoting its benefits as a complement to their latest hardware, something which EMC, HDS, IBM and most recently Dell have cottoned on to having signed an agreement to sell the VI range as part their portfolio. Despite certain pretenders claiming to take its throne, FC is certainly here to stay for the foreseeable future. If the market/customer base is allowed to fully understand and recognize its need, then there’s no preventing a future when just about every SAN fabric comes part and parcel with a VI solution ensuring its optimal use. Whether VI eventually get bought out by one of the large whales or continue to swim the shores independently, there is no denying that companies will need to seriously consider the VI option if they’re to avoid drowning in the apprehensive nature of virtual infrastructure growth or the ever increasing costs of under-utilized SAN fabrics.

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

Jake Rayson

@Andy Bolstridge > Unfortunately, we need the majority to work 9-5 And therein lies the lie. I work very hard indeed for my idleness, early starts...

47 minutes ago by Jake Rayson on The Idle Self-employed
Burn-IT

What happens when one hosting platform "acquires data" from another? If I forced the first one to remove it, who is responsible for chasing the...

6 hours ago by Burn-IT on Google picks holes in EU's 'right to be forgotten'
JohnTalich

iSpring Pro is a nice tool, that allows PowerPoint to SCORM conversion. They also have free tool, that also generates SCORM compliant courses.

10 hours ago by JohnTalich on How To Convert PowerPoint To SCORM Compliant Course
aaron.sloman

I think the answer to the question requires a deeper analysis of where the income can come from who else is now competing for it, who else will be...

18 hours ago by aaron.sloman on The three big questions about Facebook's IPO
Brent Pieczynski

Your correctness about Government websites not being compliant with their own websites is correct. Most criticism of other people takes so many...

24 hours ago by Brent Pieczynski on Privacy watchdog to chase big companies over cookie law
Kelvyn Taylor

802.11ac does promise some tricks to improve range & reliability, but not sure how these will work in practice until I get real products to play...

24 hours ago by Kelvyn Taylor via Facebook on Next-generation 802.11ac routers
mrudang009

My wife and I love our new Kindle Fire. It's lightweight, easy to use and has a great interface. The first thing I recommend anyone with a new...

1 day ago by mrudang009 on Waterstones to sell Kindles with in-store offers
mrudang009

It basically unlocks all the Android marketplace apps and unlocks the device. I am one very happy Kindle owner!

1 day ago by mrudang009 on Waterstones to sell Kindles with in-store offers
Burn-IT

Skittles with tapes and coffee cups. Old tapes so we didn't have to rewind them afterwards.

1 day ago by Burn-IT on Ten IT jobs to save up for those rare lulls
Fraud_fighter

What is mildly amusing to me is when someone thinks a strong password is as strong as one may need, when the truth is usernames and passwords are...

1 day ago by Fraud_fighter on Passwords are here to stay: get used to it
Andy Bolstridge

Performance isn't really the big thing at the moment - not when my ADSL connection will only provide a 8mbps bottleneck to the 3.5gbps speeds these...

1 day ago by Andy Bolstridge via Facebook on Next-generation 802.11ac routers
pjc158

So when is Amazon buying Waterstones?

1 day ago by pjc158 on Waterstones to sell Kindles with in-store offers
J.A. Watson

@JoshArg - Well, I am writing this from my N150 Plus, running Ubuntu 12.04 and using a Bluetooth mouse (well, to be totally correct it is a...

1 day ago by J.A. Watson on Samsung N150 Plus Netbook - Ubuntu Netbook Edition 10.04
J.A. Watson

@duncanjmurray - At least n the case of the specific system I put the SSD into, it is not the case. The boot time improvement is substantial, but...

1 day ago by J.A. Watson on Netbook Upgrade - SSD IN, Windows OUT
archerthom

Sounds like only those who have bought their Kindle from Waterstones will be able to use them in-store - very disappointing. I have no intention...

1 day ago by archerthom on Waterstones to sell Kindles with in-store offers
AndyPagin

From my mainframe operating days... 1) Play hoopla with write permit rings & a can of screen cleaner. 2) Make enormous paper chains (Christmas...

1 day ago by AndyPagin on Ten IT jobs to save up for those rare lulls
61253

An OS X perspective Filenames beginning with a dot/period (.) should not be equated with HFS Plus resource forks; misunderstandings around ._ (dot...

1 day ago by 61253 on SharePoint deployment: Pitfalls of a pioneer
ians1

There are many legal download sites for music at least that do not charge an arm and a leg like itunes or Napster. The "real" cost of an mp3 file...

1 day ago by ians1 on The Pirate Bay infringes copyright, High Court decides
Jon Howells

@Crupal.. How does refusing your websites cookies help my privacy? A quick look at your page script reveals four sets of code provided by 3rd...

2 days ago by Jon Howells via Facebook on Privacy watchdog to chase big companies over cookie law
Paul Carloss

There are hundreds, if not thousands of filesharing torrent sites, The Pirate Bay (TPB) is only one of them, while the TPB is blocked many more...

2 days ago by Paul Carloss via Facebook on The Pirate Bay infringes copyright, High Court decides

Community highlights

Jack Schofield

Smartphones run HTML5 a lot slower than PCs

Blog Post Benchmarks run by Spaceport.io show that HTML5 runs "six to ten times slower"...

22 May, 2012 by Jack Schofield
Jake Rayson

3 reasons why Mac is best

Blog Post You thought you’d seen the end of the Flame Wars) between Mac and PC,...

22 May, 2012 by Jake Rayson
Lucy Sherriff

Samsung draws logic-worthy on/off ratio from graphene

Blog Post Researchers at Samsung’s Advance Institute of Technology have developed a...

22 May, 2012 by Lucy Sherriff
Jack Schofield

Windows 8 could speed multi-monitor uptake

Blog Post Microsoft has announced that it will improve multi-monitor support in Windows...

22 May, 2012 by Jack Schofield