You are currently browsing the tag archive for the ‘Cloud Computing’ tag.
On 1 February 2010, when Microsoft Azure officially goes into production, the CTP version will come to an end. In an instant, thousands of Azure apps in some of the remotest corners of the Internet, built with individual enthusiasm and energy, will wink out of existence – like the dying stars of a discarded alternative universe.
Sadly, the only people that will notice are the individual developers who took to Azure, figured out the samples and put something, anything, out there on The Cloud and beamed like proud fathers and remembering their first Hello World console app. For the first time we were able to point to a badly designed web page that was, both technically and philosophically, In The Cloud. Even though the people that we showed barely gave it a second look (it is, after all, unremarkable on the surface) we left it up and running for all the world to see.
Now, Microsoft, returning to its core principles of being aggressively commercial, is taking away the Azure privilege and leaving the once enthusiastic developers feeling like petulant children the week after Easter – where the relaxing of the chocolate rations has come to an end. Now, developers are being asked to put in their credit cards to make use of Azure – even the free one. Now I don’t know about anyone else’s experiences, but in mine ‘free’ followed by ‘credit card details please’ smells like a honey trap.
So its not enough that we have to scramble up the learning curve of Azure, install the tools and figure things out all on our own time, we now also have to hand over our credit card details to a large multinational that has a business model that keeps consumers at an arms length, is intent on making money, and may give you a bill for an indeterminable amount of computing resources consumed – all for which you are personally liable.
Gulp! No thanks, I’ll keep my credit card to myself if you don’t mind.
The nature of Azure development up until now and until adoption becomes mainstream is that most Azure development has no commercial benefit for the developers. While some companies are working on Azure ‘stuff’, there is very little in the way of Azure apps out there in the wild and even fewer customers who are prepared to pay for Azure development… yet. A lot of the Azure ‘development’ that I am aware of has been done by individuals, in their own time, on side projects as they play with Azure to get on the cloud wave, enhance their understanding or simply try something different.
While I understand Microsoft’s commercial aspirations, the financial commitments expected from Azure ‘hobbyists’ run the risk of choking the biggest source of interest, enthusiasm and publicity – the after hours developer. Perhaps the people in the Azure silo who are commenting ‘Good riddance to the CTP developers, they were using up all of these VM’s and getting no traffic’ have not seen the Steve Ballmer ‘Developers! Developers! Developers!’ monkey dance that (embarrassingly) acknowledges the value of the influence that developers who are committed to a single platform (Windows).
It comes as no surprise that the number one feature voted for in the Microsoft initiated ‘Windows Azure Feature Voting Forum’ is ‘Make it less expensive to run my very small service on Windows Azure’ followed by ‘Continue Azure offering free for Developers’ – the third spot has less than a quarter as many votes. But it seems that nobody is listening – instead they are rubbing their hands in glee, waiting for the launch and expecting the CTP goodwill to turn into credit card details.
Of course there is a limp-dicked ‘free’ account that will suggestively start rubbing up against your already captured credit card details after 25 hours of use (maybe). There is also some half-cocked free-ish version for MSDN subscribers – for those that are fortunate enough to get their employers to hand over the keys (maybe). So there are roundabout ways that a developer can find a way of getting themselves up and running on the Azure platform but it may just be too much hassle and risk to bother.
Personally, I didn’t expect it to happen this way, secretly hoping that @smarx or someone on our side would storm the corporate fortress and save us from their short sightedness and greed. But alas, the regime persists – material has been produced, sales people are trained and the Microsoft Azure army is in motion. There won’t even be a big battle. Our insignificant little apps will simply walk up, disarmed, to their masters with their heads hung in shame and as punishment for not being the next killer app, they will be terminated – without so much as a display of severed heads in the town square.
Farewell Tweetpoll, RESTful Northwind, Catfax and others.
We weren’t given a chance to know you. You are unworthy.
In part 1 of this series I discussed the base technologies (virtualisation, shared resources, automation and abstracted services) that are at the base of cloud computing. This part deals with how those base technologies have allowed us to envision and adopt new computing models that are central to the cloud computing movement.
Part 2 : Computing Models
From the perspective of the consumer, as long as they satisfy the requirements, any external supplier can provide the demanded computing as the cost and effort of building on premise on demand computing facilities may be overkill for many businesses. As a result, large providers of computing resources are stepping in to provide cloud computing to anybody that wants it and is willing to pay. This does not disqualify the value proposition of the private cloud, but it is the public cloud providers, such as Amazon, that have been pushing the change in computing models.
If consumers require computing resources on demand, it is logical to expect that they only want to pay for those resources when they need them and while they are in use. The pricing of cloud computing is still in its infancy and sometimes quite complicated, but the idea is that consumers pay as they would for any utility like electricity, rather than pay for a whole lot of physical assets that they may or may not use. This has the potential to radically change how businesses serve customers and process data as planning is done and decisions are made based, not on upfront costs, but on dynamic usage cycles and different types and rates of billing.
Providers of these on-demand resources would, for technical and practical reasons, rather not provide highly specialised resources. It is very difficult to provide an expensive and depreciating high-end server with loads of memory and fast IO or to provide a machine with a sophisticated graphics processor. Without the provision of specialised components, regardless of the underlying infrastructure (which may or may not be assembled out of high-end components) the resources provided are straightforward an anaemic. This changes application architectures because dedicated and powerful single node servers are not available and architects cannot make assumptions about the availability and reliability of individual nodes.
There is a difference between a consumer that requires an email service and one that requires a database service so providers of computing resources need to cater to different markets. Because of the underlying approach and technology, providers generally have one particular service abstraction and the different cloud specializations, IaaS, SaaS, PaaS and others have emerged and used to identify the class of cloud computing offering.
If we consider that cloud computing is simply a logical progression of IT technologies, what is it that grabbed the attention of the market and caused vendors to invest so much money in new products and huge datacentres? The reason is that cloud computing opens up new ways of conducting and operating a business and using technology to tackle new markets.
Before looking at the types of businesses that are intrigued by cloud computing, we need to understand the value that businesses see in the cloud. While technologists may find it surprising, not everybody wants to play with cloud computing just because it is shiny and new. It seems that businesses want value in the form of cost savings, reduced risk, increased turnover, and others in order to move systems and infrastructure onto the cloud.
Continue to part 3 : ‘Business Value’
The cloud is hype.
It is the hype around a logical step in the progression of IT and somehow the term ‘The Cloud’ has stuck in the minds of vendors, the media and, to a lesser extent, the customer.
Unlike most terms that we IT is used to, ‘The Cloud’ is not specific – a customer is never going to want to ‘buy a cloud’ and nobody can, with any authority, say what the cloud is. Disagreement exists on the definition of the cloud and cloud computing – academics, vendors, analysts and customers all disagree to varying degrees. This creates confusion as well as opportunities – where every blogger, journalist, vendor, developer and website can slap a cloud sticker on their product, service, website, marketing material and even the forehead of their marketing VP, and deem it to be ‘The Cloud’ or ‘<some new form> Cloud’.
In a world of no definitions, any definition is valid.
So while I am loathe to add yet another definition to the world of cloud computing, it seems that any conversation about cloud computing starts with some common understanding about what the base concepts and principles are. I tackled the question which asks “If cloud computing is based on existing technologies, why has it suddenly become important and talked about only recently?”.
I believe that the answer is that the base technologies have matured, leading to new computing models, which business is able to realise the value and finally it leads to new computing models based, if you trace it back, to the technologies which we talk about as being part of cloud computing.
I have written an essay on this an broken it down into four parts, reflecting the layers and progression, and I will post this over the next few days.
Part 1 : Base Technology
At its most basic, cloud computing is about rapidly providing and disposing of computer resources quickly, easily and on demand.
Think about Mozy backup – you can get backup for your PC in a few minutes without having to go out and buy a backup disk, plug it in, power it up, install drivers, format, etc. Instead, you download a piece of software, put in your credit card details and ta-da, you have a good backup solution until you don’t want it anymore, in which case you simply cancel the service and you don’t have an external disk lying around that needs to be disposed of. The Mozy example demonstrates computing resources (backup) provisioned rapidly (no waiting for hardware and no hardware setup) that is almost immediately available and can be disposed of just as fast. It is, by a broader definition, Cloudy.
Unfortunately, instantly providing computing resources is not easy as one would think (as anyone who has seen data centre lead times is aware), so the seemingly simple objective of providing computing resources utilises some base technologies that are generally considered part of cloud computing.
It is the base technologies that have gradually matured over time that have given us the ability to achieve the goal of utilizing computing resources easily, and the following four are the primary influencing technologies.
Obviously, if you want resources and want them now, it doesn’t make sense to have to physically get a new machine, install it in a rack, plug it in and power it up. So a virtual machine that can be spun up within a couple of minutes is key to the ability to provide for the demand. Virtualization also forces the removal of specialized equipment on which software may depend by providing a baseline, non-specialized, machine abstraction.
Individual resource consumers do not want to buy their resources up front – it would go against the idea of ‘on demand’. So it makes sense that it would be better to create a pool of resources that are potentially available to everyone and are allocated and de-allocated to individual consumers’ needs. Multi-tenancy is a further concept behind the sharing of resources, where multiple customers can share a single physical resource at the same time. Virtual machines running on the same physical hardware is an example of multi-tenancy.
In order to make all of these shared, virtualized resources available on demand, some automation tools need to sit between the request for a resource and the fulfilment of the request – it has to be zero touch by an expensive engineer. Sending an email and waiting for someone in operations to get around to it is not exactly rapid provisioning. So a big part of cloud computing are the tools and infrastructure to spin up machines, bring new hardware online, handle failures, patch software, de-allocate and decommission machines and resources etc.
Computing resources need not be limited to specific low level hardware resources such as an addressable memory block or a spindle on a disk – not only is it generally unnecessary, but technically impossible if coupled with quick, on demand resources. A fundamental technology advancement of the cloud is the increased use and availability of abstracted computing resources (consumed as services). While virtual machine is an abstraction of a much more complicated physical layer, the abstractions become much higher-level where resources are exposed as services, so a consumer doesn’t ask for a specific disk, but rather requests resources from a storage service where all of the complicated stuff is abstracted away and taken care of.
These technical solutions to the demand problem have, in turn, had some interesting side effects on existing models of computing. The public cloud, utility pricing, commodity nodes and service specializations have emerged as rediscovered computing models that are driving the adoption of cloud technologies.
Continue to part 2 : ‘Computing Models’
As the official release of Azure looms, and the initial pricing model is understood, a lot of technical people are crunching numbers to see how much it will cost to host a solution on Azure. It seems that most of the people doing the comparisons are doing them against smaller solutions to be hosted, not in some corporate on-premise data centre, but on any one of hundreds of public .net hosting providers out there.
This is not surprising since the type of person that is looking at the pre-release version of Azure is also the kind of person that has hundreds of ideas for the next killer website, if only they could find the time and find someone who is a good designer to help them (disclaimer: I am probably one of those people). So they look at the pricing model from the perspective of someone who has virtually no experience in running a business and is so technically capable that they have misconceptions about how a small business would operate and maintain a website.
Unsurprisingly they find that Azure works out more expensive than the cost of (perceived) equivalent traditional hosting. So you get statements like this:
“If you add all these up, that’s a Total of $98.04! And that looks like the very minimum cost of hosting an average "small" app/website on Azure. That surely doesn’t make me want to switch my DiscountASP.NET and GoDaddy.com hosting accounts over to Windows Azure.” Chris Pietschmann
Everyone seems shocked and surprised.
Windows Azure is different from traditional hosting, which means that Microsoft’s own financial models and those of their prospective customers are different. You don’t have to think for very long to come up with some reasons why Microsoft does not price Azure to compete with traditional hosting…
Microsoft is a trusted brand. Regardless of well publicised vulnerabilities (in the technical community) and a growing open source movement, in the mind of business Microsoft is considered low risk, feature rich and affordable.
Microsoft has invested in new datacentres and the divisions that own them need to have a financial model that demonstrates a worthwhile investment. I doubt that in the current economic climate Wall Street is ready for another XBox-like loss leader. (This is also probably the reason why Microsoft is reluctant to package an on-premise Azure)
Azure is a premium product that offers parts of the overall solution that are lacking in your average cut-rate hosting environment.
Back to the alpha geeks that are making observations about the pricing of Azure. Most of them have made the time to look at the technology outside their day job. They either have ambitions to do something ‘on their own’, are doing it on the side in a large enterprise or, in a few cases, are dedicated to assessing it as an offering for their ISV.
They are not the target market. Yet.
Azure seems to be marketed at the small to medium businesses that do not have, want or need much in the way of internal, or even contracted, IT services and skills. Maybe they’ll have an underpaid desktop support type of person who can run around the office getting the owner/manager’s email working – but that is about it. (Another market is the rogue enterprise departments that, for tactical reasons, specifically want to bypass enterprise IT – but they behave similar to smaller businesses.)
Enterprise cloud vendors, commentators and analysts endlessly debate the potential cost savings of the cloud versus established on-premise data centres. Meanwhile, smaller businesses, whose data centre consists of little more than a broadband wireless router and a cupboard, don’t care much about enterprise cloud discussions. In addressing the needs of the smaller business, Windows Azure comes with some crucial components that are generally lacking in traditional hosting offerings:
As a Platform as a Service (PaaS), there are no low level technical operations that you can do on Azure – which also means that they are taken care of for you. There is no need to download, test and install patches. No network configuration and firewall administration. No need to perform maintenance tasks like clearing up temporary files, logs and general clutter. In a single tenant co-location hosting scenario this costs extra money as it is not automated and requires a skilled person to perform the tasks.
The architecture of Azure, where data is copied across multiple nodes, provides a form of automated backup. Whether or not this is sufficient (we would like a .bak file of our database on a local disk), the idea and message that it is ‘always backup up’ is reassuring to the small business.
The cost/benefit model of Azure’s high availability (HA) offering is compelling. I challenge anybody to build a 99.95% available web and database server for a couple of hundred dollars a month at a traditional hosting facility or even in a corporate datacentre (this is from the Azure web SLA and works out to 21 minutes of downtime a month). The degree of availability of a solution needs to be backed up by a business case and often, once the costs are tabled, business will put up with a day or two of downtime in order to save money. Azure promises significant availability in the box and at the price could be easily justified against the loss of a handful of orders or even a single customer.
Much is made of the scalability of Azure and it is a good feature to have in hand for any ambitious small business and financially meaningful for a business that has expected peaks in load. Related to the scalability is the speed at which you can provision a solution on Azure (scaling from 0 to 1 instances). Being able to do this within a few minutes, together with all the other features, such as availability, is a big deal because the small business can delay the commitment of budget to the platform until the last responsible moment.
So there are a whole lot of features that need to be communicated to the market – almost like ‘you qualify for free shipping’ when buying a book online, where the consumer is directed to the added value that they understand.
The catch is that the target market does not understand high availability the same way that everyone understands free shipping. The target market for Azure doesn’t even know that Azure exists, or care – they have a business to run and a website to launch. Those technical details need to be sorted out by technical people who need to produce the convincing proposal.
The obvious strength that Microsoft has over other cloud vendors is their channel. Amazon and Google barely have a channel for sales, training and development of cloud solutions – besides, that is not even their core business. Microsoft has thousands of partners, ISV’s, trainers and a huge loyal following of developers.
In targeting the small to medium business, Microsoft is pitching Azure at the ISV’s. The smaller business without internal development capabilities will turn to external expertise, often in the shape of a reputable organization (as opposed to contractors), for solutions – and the ISV’s fulfil that role. So to get significant traction on Azure, Microsoft needs to convince the ISV’s of the benefits of Azure and, as this post tries to illustrate, some of the details of the financial considerations of the small business and their related technology choices.
Microsoft needs to convince the geeks out there that there is a whole lot more that comes with Azure, that is very important to smaller businesses, that are not available from traditional hosting. So Microsoft needs to help us understand the costs, and not just the technology, in order for us to convince our customers that although Azure is not cheap, it makes good financial sense.
Database sharding, as a technique for scaling out SQL databases, has started to gain mindshare amongst developers. This has recently has been driven by the interest in SQL Azure, closely followed by disappointment because of the 10GB database size limitation, which in turn is brushed aside by Microsoft who, in a vague way, point to sharding as a solution to the scalability of SQL Azure. SQL Azure is a great product and sharding is an effective (and successful) technique, but before developers that have little experience with building scalable systems are let loose on sharding (or even worse, vendor support for ‘automatic’ sharding), we need to spend some time understanding what the issues are with sharding, the problem that we are trying to solve, and some ways forward to tackle the technical implementation.
The basic principles of sharding are fairly simple. The idea is to partition your data across two or more physical databases so that each database (or node) has a subset of the data. The theory is that in most cases a query or connection only needs to look in one particular shard for data, leaving the other shards free to handle other requests. Sharding is easily explained by a simple single table example. Lets say you have a large customer table that you want to split into two shards. You can create the shards by having all of the customers who’s names start with ‘A’ up to ‘L’ in one database and another for those from ‘M’ to ‘Z’, i.e. a partition key on the first character of the Last Name field. With 13 characters in each shard you would expect to have an even spread of customers across both shards but without data you can’t be sure – maybe there are more customers in the first shard than the second, and maybe you particular region has more in one than the other.
Lets say that you think that it will be better to shard customers by region to get a more even split and you have three shards; one for the US, one for Europe and one for the rest of the world. Although unlikely, you may find that although the number of rows is even that the load across each shard differs. 80% of your business may come from a single region or even if the amount of business is even, that the load will differ across different times of the day as business hours move across the world. The same problem exists across all primary entities that are candidates for sharding. For example, your product catalogue sharding strategy will have similar issues. You can use product codes for an even split, but you may find that top selling products are all in one shard. If you fix that you may find that top selling products are seasonal, so today’s optimal shard will not work at all tomorrow. The problem can be expressed as
The selection of a partition key for sharding is dependant on the number of rows that will be in each shard and the usage profile of the candidate shard over time.
Those are some of the issues just trying to figure out your sharding strategy – and that is the easy part. Sharding seems to have a rule that the application layer is responsible for understanding how the data is split across each shard (where the term ‘partition’ is applied more to the RDBMS only and partitioning is transparent to the application). This creates some problems:
The application needs to maintain an index of partition keys in order to query the correct database when fetching data. This means that there is some additional overhead – database round trips, index caches and some transformation of application queries into the correctly connected database query. While simple for a single table, it is likely that a single object may need to be hydrated from multiple databases and figuring out where to go and fetch each piece of data, dynamically (depending on already fetched pieces of data), can be quite complex.
Any sharding strategy will always be biased towards a particular data traversal path. For example, in a customer biased sharding strategy you may have the related rows in the same shard (such as the related orders for the customer). This works well because the entire customer object and related collections can be hydrated from a single physical database connection, making the ‘My Orders’ page snappy. Unfortunately, although it works for the customer oriented traversal path, the order fulfilment path is hindered by current and open orders being scattered all over the place.
Because the application layer owns the indexes and is responsible for fetching data the database is rendered impotent as a query tool because each individual database knows nothing about the other shards and cannot execute a query accordingly. Even if there was shard index availability in each database, then it would trample all over the domain of the application layers’ domain, causing heaps of trouble. this means that all data access needs to go through the application layer , which create a lot of work to implement an object implementation of all database entities, their variations and query requirements. SQL cannot be used as a query language and neither can ADO, OleDB or ODBC be used – making it impossible to use existing query and reporting tools such as Reporting Services or Excel.
In some cases, sharding may be slower. Queries that need to aggregate or sort across multiple queries will not be able to take advantage of heavy lifting performed in the database. You will land up re-inventing the wheel by developing your own query optimisers in the application layer.
In order to implement sharding successfully we need to deal with the following:
The upfront selection of the best sharding strategy. What entities do we want to shard? What do we want to shard on?
The architecture and implementation of our application layer and data access layer. Do we roll our own? Do we use an existing framework?
The ability to monitor performance and identify problems with the shards in order to change (and re-optimise) our initially chosen sharding strategy over time as the amount of data and usage patterns change over time.
Consideration for other systems that may need to interface with our system, including large monolithic legacy systems and out-of-the-box reporting tools.
So some things to think about if you are considering sharding:
Sharding is no silver bullet and needs to be evaluated architecturally, just like any other major data storage and data access decision.
Sharding of the entire system may not be necessary. Perhaps it is only part of the web front-end that needs performance under high load that needs to be sharded and the backoffice transactional systems don’t need to be sharded at all. So you could build a system that has a small part of the system sharded and migrates data to a more traditional model (or data warehouse even) as needed.
Sharding for scalability is not the only approach for data – perhaps some use could be made of non-SQL storage.
The hand coding of all the application objects may be a lot of work and difficult to maintain. Use can be made of a framework that assists or a code generation tool could be used. However, it has to be feature complete and handle the issues raised in this post.
You will need to take a very careful approach to the requirements in a behavioural or domain driven style. Creating a solution where every entity is sharded, every object is made of shards, and every possible query combination that could be thought up is implemented is going to be a lot of work and result in a brittle unmaintainable system.
You need to look at your database vendors’ support of partitioning. Maybe it will be good enough for your solution and you don’t need to bother with sharding at all.
Sharding, by splitting data across multiple physical databases, looses some (maybe a lot) of the essence of SQL – queries, data consistency, foreign keys, locking. You will need to understand if that loss is worthwhile – maybe you will land up with a data store that is too dumbed down to be useful.
If you are looking at a Microsoft stack specifically, there are some interesting products and technologies that may affect your decisions. These observations are purely my own and are not gleaned from NDA sourced information.
ADO.NET Data Services (Astoria) could be the interface at the application level in front of sharded objects. It replaces the SQL language with a queryable RESTful language.
The Entity Framework is a big deal for Microsoft and will most likely, over time, be the method with which Microsoft delivers sharding solutions. EF is destined to be supported by other Microsoft products, such as SQL Reporting Services, SharePoint and Office, meaning that sharded EF models will be able to be queried with standard tools. Also, Astoria supports EF already, providing a mechanism for querying the data with a non SQL language.
Microsoft is a pretty big database player and has some smart people on the database team. One would expect that they will put effort into the SQL core to better handle partitioning within the SQL model. They already have Madison, which although more read-only and quite closely tuned for specific hardware configurations, offers a compelling parallelised database platform.
The Azure platform has more than just SQL Azure – it also has Azure storage which is a really good storage technology for distributed parallel solutions. It can also be used in conjunction with SQL Azure within an Azure solution, allowing a hybrid approach where SQL Azure and Azure Storage play to their particular strengths.
The SQL azure team has been promising some magic to come out of the Patterns & Practices team – we’ll have to wait and see.
Ayende seems to want to add sharding to nHibernate.
Database sharding has typically been the domain of large websites that have reached the limits of their own, really big, datacentres and have the resources to shard their data. The cloud, with small commodity servers, such as those used with SQL Azure, has raised sharding as a solution for smaller websites but they may not be able to pull off sharding because of a lack of resources and experience. The frameworks aren’t quite there and the tools don’t exist (like an analysis tool for candidate shards based on existing data) – and without those tools it may be a daunting task.
I am disappointed that the SQL Azure team throws out the bone of sharding as the solution to their database size limitation without backing it up with some tools, realistic scenarios and practical advice. Sharding a database requires more than just hand waving and PowerPoint presentations and requires a solid engineering approach to the problem. Perhaps they should talk more to the Azure services team to offer hybrid SQL Azure and Azure Storage architectural patterns that are compelling and architecturally valid. I am particularly concerned when it is offered as a simple solution to small businesses that have to make a huge investment in a technology and and architecture that they are possibly unable to maintain.
Sharding will, however, gain traction and is a viable solution to scaling out databases, SQL Azure and others. I will try and do my bit by communicating some of the issues and solutions – let me know in the comments if there is a demand.