Saturday, 27 April 2013

Capacity Planning the #Devops way

The notion of #Devops serves to accelerate time to market through greater cohesion in the release management life cycle. 

So called 'service virtualisation', such as offerings from IBM and CA LISA, enables modular testing practise by learning typical behaviour patterns of defined systems. The effect is a more tightly focused testing process that reduces the dependency on external (inert) services.

Release Automation, such as in the newly acquired Nolio solution, allows the testing process to be further streamlined by providing cohesion through the multistage process. The benefits are most highly felt where complex dependencies and configurations add magnitude to setup and teardown for QA.

Agile methods need agile release management processes, and this is the whole point of #Devops. However the risks in this agile thinking come in end- to-end performance.

The missing link here is provided by prerelease Capacity Planning (such as provided by CA Performance Optimizer) , a virtual integration lab that brings together the performance and capacity footprints of each component in the production service. And while some of those components may be changing and therefore measured through the release management process, others are not - and are measured in production. Creating and assimilating component performance models allows the impact of each sprint to show on IT operations.

Capacity Planning is a true #Devops process.  Only by adapting the capacity plan to take into account the step changes due to software release, can the risks of poor scalability and damaging impact be accurately guarded against.

Monday, 25 March 2013

Take a Capacity Healthcheck


Heart, lungs, liver, kidneys: we all recognize the importance of looking after your vital organs - and regularly take medical advice whenever we have concerns.  Yet it is not the fitness of individual organs that causes concern - it is the fitness of the weakest. 

But what about our IT enterprise?  It's not uncommon - even in 2013 - to come across siloed thinking that stifles the health of the organisation.  Purchasing decisions are made within the silos, leading to distorted allocations of capacity based on political, rather than engineering needs.  Further - due to the manifestation of these silos, entropy increases as financial accountability struggles to permeate the organisation.  Provisioning decisions are made on a risk-averse basis without insight into how business demands translate into capacity requirements.

And now it's time to change.

The financial crisis has caused a significant change in emphasis in most major corporations.  IDC estimate that over 50% of major corporations are actively planning investments in better capacity management functions.  Cloud-sourcing is on the increase to assist with the deferment of cost, and a refocus of investment on the core business.  Capacity has become a commodity, and in an open economy, is becoming subject to the same commercial forces that balance cost and quality in any marketplace.


But how can enterprises leverage their purchasing power, when they don't know how much commodity capacity they really need?  How can they right-size investments and defer costs without incurring risk to their top-line revenues?

Actually, this is a question that enterprises have addressed in many of their other lines of operation.  Full financial accountability has ensured the right-sizing of many other enterprise assets; whether that is employees, hot-desks, freight containers or manufacturing capacity.  Successful companies have figured out that costs must be aligned proportionately with revenues.  The only thing that makes IT different is the complexity, and the lack of insight.

So where to start in right-sizing IT?

The answer is perhaps startlingly clear - you should begin where you always begin, with your requirements in mind: measure capacity consumption across all silos.  The trick then is to bring in a method of normalization, a model library that provides weighting factors according to the make and configuration of your estate.  This same method can then be applied to plan a migration, transposing configurations easily to determine the optimum sizing on alternative real-estate.

So what's stopping you? Find a friendly capacity management service provider and ask them about their IT Capacity Healthcheck. If they know what they're doing: they'll get their scheduler out straight away.  And if they don't? Well - drop me a tweet or an email and I'll point you in the right direction.

Thursday, 24 January 2013

Consumer/Provider : the twin forces in Capacity Management

Those schooled in traditional IT capacity management have long recognised the cause and effect of observed system behaviour.  Few have managed to bridge the gap in quantifying the correlation, however, and for good reason.  Straying too deep into this territory can leave you struggling with data overload, and no way of mapping volumetric and utilization data together. The age of the CMDB and automated discovery and mapping has changed the landscspe in thus regard.  Finally, using configuration mapping to correlate volumetric data against utilization data can be done reliably, consistently and accurately since all feeds are automated.

Correlating service throughput against observed utilization provides intelligence to optimise design, streamline performance, & predict and optimize application scalability.  But in a consumer/provider scenario there are two contexts to consider. Presenting the customer with data about your underlying infrastructure utilisation lays bare the margin or risk levels of your operating model. Equally, the customer's main concern is ensuring their service levels are not jeopardised, and they are not burdened with excessive costs for underutilized environments.

Despite the advantages commonly sought in quantifying the capacity of the physical environment, it is the capacity of the contractual environment that is crucial to the customer. In a cloud context, the provider must diligently ensure the reliability if their operating model. This is crucial to brand equity. But the customer's primary concern will be in managing the flexibility of their service based contract, and ensuring that risks are properly balanced against costs.

Monday, 14 January 2013

To Transform IT - Revisit The Basics

Big Data and the world of business analytics has much in common with Capacity and Demand Management as we know it.  Pertaining to derive competitive advantage by acting on timely business intelligence, business operations analytics requires number crunching on a huge scale.  In the highly competitive world of e-business, the imperative for business agility reaches its peak.  Where aligning appropriate investment with prevailing demand becomes a critical business decision, no less is the importance of that decision to the world of capacity management.  Meaning - business agility depends on Big Data to make sure there are enough sales reps selling hot products, to make sure there are enough of the right sort of product on the shelves, and to commit the right amount of marketing to the products or services delivering the highest profit. The connection with the IT cloud here is clear - aligning IT resources to demand is equally critical to the agile business.  Indeed, such agility is one of the main drivers behind cloud computing.  By transforming IT delivery into a service model, one has the ability to quickly and easily ramp up investment when warranted by demand - or to ramp down.

Nice idea.  But does this happen widely in the field?  Evidence indicates that the transformation to the cloud model in the majority of organisations has hit a glass ceiling.  With the existence of service catalogues, virtual adaptable infrastructures, and increasingly automated processes - IT organisations have put in place the basic ingredients that enable some of the tasks associated with cloud service provision.  However, the vision of agile IT-on-demand has been held back by slow adoption of a business-integrated view more aligned with the balance sheet.  IT resources like many business resources come at a cost.  Not just a cost to purchase, but a cost to provision, a cost to operate, a cost to maintain and a cost to support.  Factoring cost of ownership into resource provisioning requests, and aligning investment appropriately according to demand, are the two pre-requisites for the agile business.

These pre-requisites translate themselves into two management capabilities that are widely missing in IT operations today.  Adding these capabilities to IT management functions will not only provide insight and control over efficient use of IT resources, but will also provide consumer-friendly insight to support optimal alignment of resources.

Firstly, by garnering control of operational costs - IT cloud operations can start to truly drive efficient investments.  A simple cost / utilization correlation is an excellent start to determining efficiency.  For the most accurate approach, this analysis should be carried out by the Capacity Management team to factor in variables like the different sorts of utilisation (meaning virtual, physical, logical - all of which are environmental related), and for what-if scenario analysis to determine the possibilities for optimization.

Secondly, by assessing usage patterns quickly, dynamically and providing short and mid-term trends to the consumer.  The aim here is to ensure the right amount of headroom is maintained in the environment.  A service-aligned view of capacity allocated is essential, such that the service headroom can be correctly calculated according to the weakest link in the chain.    The insight that's needed is to gauge whether the service headroom will be sufficient to meet demands, according to trends, forecasts and other business analytics.  Regression and correlation  between workload and resource usage is another function classically described in Capacity Management.

So - what are we saying here?  That Capacity Management is the missing link between cloud operations and an agile enterprise?  No - not quite.  Capacity Management as it is currently executed and understood is not fit for this purpose.  However, connecting Capacity Management with both the demand cycle - notably from a service not an infrastructure point of view, and also with Financial Management has the ability to disrupt the enterprise cloud, and transform it to become a true partner to the agile business.

Monday, 19 November 2012

Leadership .. and the rush to the cloud

Might seem strange to write a post about leadership, on a cloud computing blog; but think about it this way -- the 'Cloud' is all about leadership.  It's the bandwagon that you've got to get on, irrespective of where it's going.  It's the "common wisdom" where 'thought leaders' are too conscious of the opinions of others and began to emulate each other and conform rather than think differently (wiki link). 

The reason I think this is happening, is that the fundamentals of IT service delivery have't gone through a revolution in the last 5 years.  Sure, there have been leaps forward - notably in the smartphone and tablet markets, which have radically influenced accessibility and demand for services.  Over the last 5 years, we have also seen incremental advances in IT capacity, in networks (notably end-user bandwidth which has been driven by the increasing demand), and in compute and storage terms.  But the fundamentals of IT service delivery haven't changed.  If you had implemented ITIL 5 years ago, you would still have the same frame of reference today and it would serve you well. 

The difference is perception, and the advances of running IT like a business.  Yet, this was one of the main strategies before the cloud came along.  The reality is that business caught up with IT, and debunked the myths of risk-averse culture that became prevalent in many large enterprises.  The business started to demand quality of service, and began to put a focus on costs.  Just like an enterprise would manage costs in any other part of it's business, IT soon found that they were under similar cost pressures - and these became accelerated as the global downturn impacted profit margins.

What's really interesting though is the way that certain business models have begun to prosper in this new dynamic.  Those are models that allow businesses to move away from large sunk capital investments, and towards a flexible model that allows them to account for their costs as a percentage of their revenue stream.  There clearly is a great deal to be gained in accounting transparency here, but there's more - these flexible arrangements allow businessses to scale their cost base according to their overriding dynamic.  On the face of it, it's a low-risk engagement for the customer.

But here comes the rub.  As any risk analyst will tell you, it's the weakest link in the chain that tells you where your true risk really lies; and of course there are a number of risks associated with moving to this flexible arrangement that could scupper the whole deal.  For several years, the security risks of losing personal data were often quoted as a show-stopper.  More reputable companies offering their services have mitigated those risks -- for now at least.  There are a certain number of regulatory factors to take into account, not least the actual jurisdiction of the data stores.  Balancing these competing risk factors is the business of IT leaders.

Wednesday, 17 October 2012

Strip Down Cloud: the basics of cloud provision


Let me set my stall out: I really think that "in the cloud" is not a term created by IT professionals, or even marketing teams.  I think it is a term created by the archetypal technophobe businessman who doesn't want to be bothered by the details of IT, and just wants a service delivered - and doesn't care how.  He* wants it like a phone contract - something that can be tailored to the number of minutes, the number of texts - and can be flexible to move with his business.

All this stuff about what the cloud really is - is really just guff.  Take for example self-service.  This business guy doesn't care about self-service.  In fact, it would be perfect if somebody else could do it for him.  He wants a way of managing the contract himself - but he doesn't want to administer the service himself.  If his business volumes go up - he wants to adjust the contract, so he has the capacity to support his business.  If the volumes go down, likewise - the alignment with his business needs is what's important to him.

Take virtualization technology.  This is a means to an end, it provides the rapid provisioning that this business guy really wants.  But he doesn't care about virtualization.  He cares about rapid response.  If he orders more minutes on his phone contract; he wants the minutes instantly (although he might be satisfied to wait until the end of the monthly billing period).  The same thing is true with his IT cloud.  He wants to adjust his contract - and then wants to see rapid implementation of the changes.  But it could be a horde of magic goblins for all he cares.  

The only things this guy cares about are the quality of service he is receiving, the cost of the service, and the ability to flexibly manage this service.  Just like his phone contract -- if the quality is no good, he will cancel it and move to a provider with better performance.  If the cost is too high, he will move to a more competitive provider.  And the flexibility in the contract will allow him to do that (although phone contracts often have a lock-in term; but of course if the businessman was designing the terms, then it wouldn't).

So what are the essentials of the cloud, from a technologists point of view?
  1. the ability to measure and manage a quality of service.  A provider who values customer service (and many would argue that customer service is the cornerstone of a successful selling organisation) will proactively manage service levels and ensure that the cloud customer is getting the service levels that they need - and that they are contracted to.  For this, we would recommend not only some level of service assurance monitoring but also some risk avoidance through predictive analytics; typically found in a capacity planning process.  In addition, where service levels are set either contractually or through expectation, some form of management of performance against those service levels - business service insight - will be imperative.
  2. the ability to manage cost and capacity effectively.  Given that an unsatisfied customer may change contracts freely or at will - and that the cloud marketplace is a competitive one, cost is the second important factor in a customer's investment decision.  Cost in a cloud environment is borne mainly through infrastructure operations, and ties together elements like facilities, management, power/cooling, and capital costs.  Uptime institute published a very good paper on this recently (click here).  Equally though the price that is charged to the customer must either equate to the cost (in an internal private cloud charged as a cost center) or exceed the cost (as a profit center in a  business) and be derived from the cost of the allocated capacity.  
  3. the ability for a customer to flexibly manage their contract.  There must be an easy way for the customer to change their service.  Increasingly tech-savvy customers demand portals by which they can manage their own service levels.  A self-service portal is often the lowest cost way of providing this capability.  However the cloud does not mandate a portal, in fact a call-center can provide the same facility.  Most of the time I manage my phone contract is through a call center; and the benefit is that my provider gets the chance to sell me something new every time..!
  4. the ability to rapidly deploy any changes to a customer's service.  The cheapest and quickest way of doing this is likely through usage of virtualization technology, where existing and unused capacity can be allocated to a customer.  New technologies are emerging here all the time, around storage and network capacity as well as compute capacity.  Hybrid cloud providers are using third-party capacity to extend their capability quickly and leverage existing data center space. 

*He could be a *She too; and probably is

Thursday, 27 September 2012

Today, a MHz is just not a MHz any more...

A common trend in virtualization environments is to use the easily-accessible MHz rating of the server as a normalization parameter - so that when you're considering an optimization routine, you can identify available capacity in terms of MHz and compare it to some other capacity being used, and determine whether there's a fit - or not.  While this method of normalization makes complete sense in terms of the data available, I'm here with some bad news.  They just don't make MHz like they used to.  Actually, in many cases, they make them better!  The SPECint2006_rate benchmark is a measure of throughput for CPUs.  This is a direct comparison with MHz, which has a direct correlation on throughput.



Confused about capacity?  Take this example...  How much oil can you get through an oil pipeline is proportional to the cross-section of the pipeline - how fat it is - and the speed at which the oil moves through the pipeline.   Take that into digital context - and the cross-section of a CPU is related to the number of cores, and the speed of the CPU is measured by MHz.  The clock speed is the frequency of the chip - and defines how quickly a task can be processed by the CPU.



They don't make 'em like they used to...

The problem though, is that the clever guys at Intel and the other processor manufacturers don't want to play this game.  They're always thinking of new ways of boosting performance that don't rely on just a MHz improvement.  Take a look at the data.  The chart below shows the ratio of SPECint2006_rate to GHz over the last 6 years, controlling for the number of cores in the benchmark measurement.  This shows that the GHz in 2011 is equivalent to 1.15GHz just 12 months earlier.  Another interesting point, is that AMD data doesn't show this same rate of change - the trend line has a lot lower gradient.  This highlights that the chipset is a hugely important factor when using MHz as a normalization rating.  A MHz just isn't a transferable unit.


Data for HP Proliant only Intel directly from SPEC.ORG 


Conclusion

Normalization is a key part of good capacity management practise.  Using a percentage is simply a recipe for disaster when trying to apply intelligence to configuration optimization.  Using MHz is an easy option, but alas  is just fool's gold.  The data for Intel chips shows that the processing rate for chips changes dramatically over time, and that could introduce errors of over 15% per 12-month period for optimization.  If you were moving from legacy kit, 3 years old, the error margin may be over 75%.  This will always lead to over-specified machines - and a higher spend than necessary to meet business requirements.  Whilst that amount of headroom may have been justified in directly allocated capacity, in the cloud that overspend represents a high cost of ownership and immediate optimization challenges on deployment.