PDA

View Full Version : Minimum CPU level?


Julez
24th September 2007, 10:05
I currently have a dedicated server and traffic is very steady and relatively low, so the flexiscale's pay-as-you-go pricing works for me. I also like the added hardware reliability of flexiscale - not being reliant on any one hardware component.

However, I don't like the idea of my server's performance dropping below an acceptable level due to high levels of activity on other servers. This was why I moved off shared servers & onto dedicated in the first place.

What concerns me about flexiscale is that it will attract other clients with wildly fluctuating traffic with occasional very high peaks such as Cheddarvision. These are exactly the kind of rack neighbours I want to avoid.

Any server is impacted by high network traffic to another server in the same rack, but at least a dedicated server guarantees 100% CPU availability.

I realise that stressed servers can be switched to receive more resources and this reduces the risk of impact on other servers.

However, what I need to know is that even in a WORST CASE scenario, my servers will get a minimum level of CPU.

If this is a planned feature, is there an ETA?

Cheers,
Julez

phuber
27th September 2007, 23:26
Julez,

Sorry for not replying sooner to your questions. Once you are through my reply, you may appreciate the delay.

This is a very good question indeed. If there was a question that has occupied us internally the most, it was this one. QoS in a virtualised environment is kind of the Holy Grail really. How can we guarantee a certain level of performance to our customers? Unfortunately, due to the complexity of the challenge, this will be a long forum post. You better go and get a coffee and make yourself comfortable :)

Before I go into an in-depth explanation, let me introduce myself. I am the COO and acting CTO of XCalibre. My team and myself have been working on this platform for the last 18 months. Besides my operational responsibilities, I have been working on this new platform day and night. I haven’t started dreaming in FlexiScale API calls …yet, but I think it is just a matter of time.

Background
Let’s go back to the very beginning. You could claim that virtualisation is nothing other than a very advanced version of shared hosting. There are a number of key differences though:


Each Virtual Dedicated Server runs in its own user space, therefore allowing you to mix Windows Server with Linux and various version of Apache, MySQL and PHP. This was not possible with Shared Hosting and was a problem if a customer wanted slightly different LAMP stack versions than the average user.


The Xen engine provides increasingly good control over the CPU an individual guest system can consume. We will be doing an upgrade to a new Xen engine on October 6th that will give us massively improved CPU control compared to what we have today. Again, Shared Hosting didn’t provide any control over how much an individual customer was using, making the platform unpredictable.


So what are the key contention factors in a platform like FlexiScale:
1. Memory
2. CPU
3. Network Bandwidth
4. Storage Bandwidth

Memory
While Memory is a big factor for the provider, it isn’t really the same for the customer. Once the Memory is allocated to you, it is yours and yours only. It is not shared in any form or shape.

CPU
Things are bit different for CPU. The total of available CPU resources in a physical server is shared among all customers that currently are ‘sitting’ on that box. It is correct that other customers that are on running on the same physical box as you can have an impact on your performance.

As mentioned before though, Xen provides us with a very good level of control over CPU. We can control how much of the overall CPU of the physical server a single guest system can consume (from 1-100%). This already gives us a good way to avoid a customer ’stealing’ your CPU cycles. Besides that, Xen has a built-in CPU scheduler that lets you play with priorities and multiple virtual CPUs at guest level. This again will stop someone dominating the CPU consumption. We currently give all customers the same level of priority, but we will be launching a detailed study in November that will look into the impact of giving customers different levels of priority and numbers of virtual CPUs.

Network Bandwith
As far as Network Bandwidth is concerned, this is shared as well. However, as far as we are concerned, this should never be a concern. In total, our physical servers have multi-Gigabit available Network Bandwidth. Different types of Bandwidth are totally separated - user, management, inter-server traffic and storage access.

Beyond the Bandwidth for an individual server, we will always ensure that we have plenty of free Bandwidth in our switched/routed network architecture and in our IP transits into the internet. This is something that is not FlexiScale specific, something we have been doing for the last 10 years.

Therefore, within reason, you never even notice if another customer in either the same server or, in another server or another rack, pushes a very large amount of traffic (i.e. the Cheddarvision example, this did not affect any other customer adversely).

Storage Capacity & Bandwith
Available Storage Bandwidth is determined by the network (physically separate from the other bandwidth) connecting the physical servers to the storage backend and by the capability of the storage controller to handle I/O requests. Storage Capacity is simply dependent on how many hard disks you bought, as simple as that.

FlexiScale is built upon top-quality high-end storage product from a Tier 1 manufacturer that can scale up in terms of Storage Capacity and Storage Bandwidth on demand. We closely monitor our storage backend that provides us with comprehensive management tools and will always add more resources before our customers will notice.

The missing link
The key problem is that there is currently no applicable performance benchmark available for providers like us. We are in talks to Amazon (EC2 group) and others to move things forward.

Obviously, we are closely monitoring our competitors and what kind of information they provide to their customers. Notably, Amazon EC2 provides some form of performance figure when they quote that each virtual machine has the equivalent of a 1.7Ghz x86 processor. I am personally not really sure if this has anything to do with a relative or absolute performance measure. Check http://en.wikipedia.org/wiki/X86 and figure out yourself. x86 stands for an instruction set and does not refer to a specific type of processor, i.e. Pentium or Xeon.

Not surprisingly, Amazon very recently got into trouble with their performance numbers:
http://developer.amazonwebservices.com/connect/thread.jspa?threadID=16912&tstart=60,
http://developer.amazonwebservices.com/connect/thread.jspa?threadID=12297&tstart=60

We don’t want to repeat that mistake.

My CEO pushed me hard to release a number, too, but the core team came to the conclusion that while we would love to do so, we do not feel comfortable doing it. However, we haven’t ignored the challenge and have been very active to tackle it in our own way with short-term and strategic long-term solutions.

What are we doing to address the issue today
There are a number of things we are already doing to provide customers with consistent performance. These are:


Since we are proving a high-end virtualisation platform, we are using a relatively low level of server contention (number of VDSs per processor CPU). This is really the most important factor when looking into predictable performance. If you pile in too many guest systems into a single piece of HW, even the best monitoring system and management platform can provide you with good performance.



We collect detailed information in real-time about all key performance factors of our platform – available memory, CPU, network bandwidth and storage capacity/bandwidth. We can look at the general trends and purchase new equipment accordingly. This is really important – we will NEVER run out of free capacity. The more customers buy from us, the more of the profits will be directly re-invested to scale the platform up. Just imagine the face of my sales guys if I tell them that they have to stop selling to all these hungry customers because I would rather retain all profits for me!



Our management platform monitors the server load of each individual server in real-time and should one physical server move into amber/red, it will automatically select one or more VDSs that will move (LiveMigrate) to another server with lower CPU load. This process is fully transparent to the end-user. Xen guest systems are agnostic of physical HW and can be moved to any physical server in the same cluster.



Looking beyond a single individual server, it could well make sense to buy two smaller (2x 512Mb instead of 1x 1Gb) FlexiScale servers and load-balance them. With linear memory costs, this could be a more effective solution. Furthermore, it allows you to do your own applications upgrades without taking the service down (not true for all architectures, needs to be reviewed individually) and it spreads the risk.


There are a few things we are in the process of doing. These are:

We will kick-off a detailed study of how the new Xen Resource Scheduler works in November 2007. It lets you play with VDS CPU priority and number of virtual CPUs within the VDS. We need to understand how these two parameters affect a fully loaded server and the individual VDSs. It is our intention to release a more granular per hour pricing as soon as we are confident enough in our understanding of how these two parameters interact and therefore justify different pricing.



We have launched a major research project with a prestigious UK university that in a nutshell will eventually allow us to provide QoS. Details will be released closer to a launch date.


OK, finished with rambling. The answer to your question …..I need to know that even in a WORST CASE scenario, my servers will get a minimum level of CPU … is that you will always get a good level of performance from FlexiScale. We offer a high-quality product and good performance is a key to that. Our reputation hinges on it.

Totally predictable VDS performance (QoS) will become reality once we release the first results from our research project. Additionally, we will soon be more specific about our performance once a formal body publishes an officially recognised benchmark.

Watch this space.

Phil Huber, CTO XCalibre Communications Ltd.

Julez
6th October 2007, 09:19
Thank you for your detailed reply, which coming from the CTO has exceeded my expectations somewhat. Apologies for the delay in replying - I neglected to subscribe to the thread so didn't get the notification of your reply.

I've heard good things about XCalibre from mates but have had no dealings myself, until now. A certain type of hosting company simply deletes awkward or negative forum posts. The type of hosting company that ignores problems and hides failures is the type to be avoided, so you have passed the first test!

I also want to avoid the host offering 'unlimited' sites with 'unlimited' bandwidth for a low fixed monthly fee. They have an irresistible temptation to minimise investment and keep signing as many customers as possible. There's a conflict of interest built into that business model.

However, your metering approach means that your high traffic clients come with corresponding revenue which it's in your interests to re-invest, because if you run out of resources then you're losing revenue. Ordinary sites will find this kind of hosting cheaper than a dedicated server but sites with very high traffic and no profit are deterred. Perfect.

This, combined with your technical information and roadmap convinced me to buy a test server. I have used a monitoring service sampling every 5 minutes for the last week, monitoring a flexiscale virtual server and a dedicated server. The flexiscale server was far quicker and far more consistent - a far flatter performance graph than the dedi. Maybe that's because the flexiscale system is new and fairly empty. I'll keep an eye on the monitoring results.

The control panel, server setup, technical support have all been good.

Good luck with today's upgrade. I look forward to playing with the new control panel features.

Cheers,
Julez

terrycojones
8th October 2007, 15:24
Someone tell me that the above thread isn't like Jeff Gannon lobbing softballs to president Bush and then applauding as he hits them out of the park......

Sorry, I'm a bit of a skeptic :)

Julez
8th October 2007, 15:56
Someone tell me that the above thread isn't like Jeff Gannon lobbing softballs to president Bush and then applauding as he hits them out of the park......

Sorry, I'm a bit of a skeptic :)

That wasn't a softball. Despite their best efforts, it's still their achiles heel in my view, so not something they'd want to draw attention to unnecessarily. If you read carefully, they are not able to give me the CPU guarantee that I have asked for. "Trust us, we know what we're doing" is what it amounts to. Rather than that, I have decided to judge them by results.

And I'm not Jeff Gannon - See my company's web site link to my profile.

Cheers, Jules

terrycojones
8th October 2007, 21:12
Hi Jules

Thanks for the reply. I was only 1/2 serious, but it's reassuring anyway :-)

I agree with your assessment of Phil's answer. As you say, it boils down to performance.

Phil, I've spent years at prestigious UK universities and various other "prestigious" places. I'd save your cash and time and focus on delivering product and performance. While I suppose those sort of things still impress some, word of mouth (and forum) is much more important these days - especially seeing as the waters of "University studies show..." were long since muddied. So you commissioned a study eh?

Anyway, I'll check out FlexiScale. It's great to see a competitor to Amazon. I think you should compete on terms of service. I find Amazon's quite worrying, and a little competition should make a difference. Can you (FlexiScale) take advantage of legal differences to offer privacy or other angles that Amazon cannot provide?

Terry

phuber
10th October 2007, 17:37
Terry,

thanks very much for your honest and direct response. This is the style I personally like and I want to use for the future.

To add a bit more to my previous post, I am very pleased to announce two new cornerstones towards predictable performance. We took some strategic decisions over the last two weeks regarding SLA's for FlexiScale.

1) We will be offering a platform availability SLA of 99.99% but the details how this has to be applied to a virtualised environment like FlexiScale is still being worked out.
The target publishing date of the SLA is end of October. We are still working on a plan to have the monitoring and publishing implemented.

2) After long deliberation and in absence of a formal performance standard (see http://www.spec.org/specvirtualization/), we decided to move forward with our own version of a Virtualisation Performance Standard that will provide our customers with a reliable performance indicator until Spec.org provides a commonly used and accepted new standard.
The target publishing date of the standard is mid-November. We plan to have the monitoring and publishing of performance data implemented by the end of November.

Cheers,
Phil