FlexiScale Forums

Go Back   FlexiScale Forums > FlexiScale > General Discussion

Closed Thread
 
Thread Tools Display Modes
  #1  
Old 30th October 2008, 09:06
Flish Flish is offline
Junior Member
 
Join Date: Oct 2008
Posts: 8
Angry Why?

I am angry, very angry, so yes there's some vitriol in here, I was hoping that sleeping on it would dull that, but being that all my servers are still down it hasn't. Response and comments requested;

-Why does a 'highly available and scalable' service take in excess of 12 hours to startup? , and counting, of course my servers are last. I assume this was a known factor in the planning phase, and deemed acceptable?
-Why does a 'connectivity issue' (which to me means a router, or upstream failure) result in downing the entire system? Where's the redundancy?
-Why have you not learnt lessons from the last outage, the cause was almost comedy, and I left you to it, but the time it took you to restore was unacceptable. Why have you not learnt?
-Why is it that when 50% of servers were restored about 2am (and odds on that at least one of mine would be back), do I still have none?
-Why do you have *any* single points of failure within the data centre?
-Why are you planning 60% capacity expansion in coming months if you can't manage what you have, real world experience tells me it just means it will take twice as long for servers to come back?
-Why do you have plans to scale the cloud over multiple DC's when one can't properly?
-Why is the practical availability no better to me (as a user) than it was in Beta some 14 months ago?
-Why haven't you fixed the IO issues acknowledged months ago. I've scaled a low user LAMP server from 1CPU with 1MB to 3xCPU with 4MB and MySQL is still shocking, and the Perl cron (reads files, pumps through image magick, and outputs PDF's to disk kills it)
-Why do you charge me for this?

And in general terms that I constantly bite my tongue about;

-Why does the Control Panel not work, I have *never* yet instructed a job that has happened without having to follow it up with a support call?
-Why, if you know you can only process jobs one at a time, do you have the ability to schedule a job for a certain time, I do this and it simply mans my server goes down at some point after that time, and doesn't come back. What you really mean is you'll put the job into the queue at that time, no help to me scheduling down time.


Most of all, when my core client on this platform is a seasonal web app (Oct - Dec), assuming I get to keep them after this years fiasco, should I not shove them back on the £50 pcm Fasthosts colo that worked perfectly, and performed properly, that I had to use last year when you had a day long outage? For the money I pay you I could build a self build ha cluster of 3 or more boxes that would wipe the floor with your current uptime, and performance.
  #2  
Old 30th October 2008, 09:12
tonylucas tonylucas is offline
FlexiScale CEO
 
Join Date: Jul 2007
Posts: 54
Default

I will get back to you with a detailed (and honest!) reply to this in the next few hours. All resources are concentrated on getting the remaining servers up.

Tony Lucas
Chief Executive Officer
FlexiScale/XCalibre
  #3  
Old 30th October 2008, 11:48
Flish Flish is offline
Junior Member
 
Join Date: Oct 2008
Posts: 8
Default

Why, when all servers were restored 3 hours ago are mine still down, odds of that?
  #4  
Old 30th October 2008, 14:31
RichText RichText is offline
Junior Member
 
Join Date: Oct 2008
Location: UK
Posts: 9
Default

This is a shame... I really wanted to like Flexiscale. I gave it a few chances, but in the end, it just looks a bit too unreliable.

I'd rather have a service that works than one with an SLA that doesn't. If they want to compete with Amazon EC2 or Slicehost (not strictly 'cloud' but very flexibile), then reliability is going to have to improve.

I wish Flexiscale every success in the future - I hope they get things sorted out (for themselves and their customers)
  #5  
Old 31st October 2008, 14:12
Flish Flish is offline
Junior Member
 
Join Date: Oct 2008
Posts: 8
Default

Just a follow up for the record that I've had Tony Lucas from Flexiscale on the phone running through some of my q's and a general, honest explanation of what happened. I won't repeat here, Tony has promised to address this post in due course so will leave it to his own words, and in the past he has always been brutally open when needed.

Just wanted to shine a little good light into the darkness, and credit Tony for being so open, a refreshing change.
  #6  
Old 3rd November 2008, 18:27
tonylucas tonylucas is offline
FlexiScale CEO
 
Join Date: Jul 2007
Posts: 54
Default

Quote:
Originally Posted by Flish View Post
I am angry, very angry, so yes there's some vitriol in here, I was hoping that sleeping on it would dull that, but being that all my servers are still down it hasn't. Response and comments requested;

-Why does a 'highly available and scalable' service take in excess of 12 hours to startup? , and counting, of course my servers are last. I assume this was a known factor in the planning phase, and deemed acceptable?
This wasn't a known factor in the planning phase, not something that any of us consider acceptable and only came to light much later on when specific software architecture was already in place. This is something we already have been working on replacing, but will be stepping up these efforts as I'll discuss in my post in the announcements forum.

Quote:

-Why does a 'connectivity issue' (which to me means a router, or upstream failure) result in downing the entire system? Where's the redundancy?
This was an internal issue involving the switches between the storage network and the physical nodes that the virtual machines run on. We are working on ways to improve this in future, but due to a software limitation it didn't fall over correctly, especially as we had both primary and backup switches fail within rapid succession.

Quote:
-Why have you not learnt lessons from the last outage, the cause was almost comedy, and I left you to it, but the time it took you to restore was unacceptable. Why have you not learnt?
We learn't an awful lot of lessons from the last outage, unfortunately there is a time delay between deciding that changes need to be made, and actually being in a position to make them, especially as most of them relate to a commercial piece of software we use that we are replacing.

Quote:
-Why is it that when 50% of servers were restored about 2am (and odds on that at least one of mine would be back), do I still have none?
Servers were restarted by customer number, not by server id (it's a long story as to why, but it makes more sense internally).

Quote:
-Why do you have *any* single points of failure within the data centre?
From a physical architecture point of view there is no single points of failure, (dual power, dual networking, dual heads etc) however parts of the software that handle the failover of this system was not functioning correctly.


Quote:
-Why are you planning 60% capacity expansion in coming months if you can't manage what you have, real world experience tells me it just means it will take twice as long for servers to come back?
This is why we have taken the decision to suspend all new signups until we can get these problems resolved. Reliability and stability are the two most important factors for us, above everything else, and we are not delivering on that to the level that we should be at present.

Quote:

-Why do you have plans to scale the cloud over multiple DC's when one can't properly?
The architecture fixes we are bringing in place will solve these problems completely, and expanding into multiple datacentres will increase redundancy further, not reduce it.


Quote:

-Why is the practical availability no better to me (as a user) than it was in Beta some 14 months ago?
It is a great disappointment to me that it is not, hopefully some of the steps I'm outlining here (and further in my other post I will write shortly) will outline how we will be eliminating these issues.

Quote:

-Why haven't you fixed the IO issues acknowledged months ago. I've scaled a low user LAMP server from 1CPU with 1MB to 3xCPU with 4MB and MySQL is still shocking, and the Perl cron (reads files, pumps through image magick, and outputs PDF's to disk kills it)
We had successfully fixed these issues, but it does appear they have reoccured with a few customers, it is of great interest to us to understand these more, so if any customers reading this are having IO issues, please do let us know so we can investigate.


Quote:

-Why do you charge me for this?
At the end of the day, this is designed as a commercial service, and where we haven't delivered up to the level's we suggested we have offered compensation, but we are not in a position to run the service for free.

Quote:

And in general terms that I constantly bite my tongue about;

-Why does the Control Panel not work, I have *never* yet instructed a job that has happened without having to follow it up with a support call?
This relates back to the same architecture issue I'm afraid, sometimes job's which should go through successfully fail. This is something that is in the process of being completely rewritten inhouse to deliver exactly what it should do. Also, we do admit the CP is lacking in usability, and this is something else that is being addressed.

Quote:

-Why, if you know you can only process jobs one at a time, do you have the ability to schedule a job for a certain time, I do this and it simply mans my server goes down at some point after that time, and doesn't come back. What you really mean is you'll put the job into the queue at that time, no help to me scheduling down time.
Two points here, most of the time there isn't a particular queue (albeit that has been excerbated in the last week) so there isn't always an issue with doing this. Secondly and most importantly the new system which is being written is fully multi threaded and will only be limited by the number of physical machines in the platform.

Quote:

Most of all, when my core client on this platform is a seasonal web app (Oct - Dec), assuming I get to keep them after this years fiasco, should I not shove them back on the £50 pcm Fasthosts colo that worked perfectly, and performed properly, that I had to use last year when you had a day long outage? For the money I pay you I could build a self build ha cluster of 3 or more boxes that would wipe the floor with your current uptime, and performance.
I hope in the coming weeks and months to change your mind on this point, although I know at this point it is actions, not words that are needed.

I'll put more information in the new post I'll make shortly.

Regards,

Tony Lucas
CEO
FlexiScale/XCalibre
  #7  
Old 3rd November 2008, 20:40
idega idega is offline
Junior Member
 
Join Date: Nov 2008
Posts: 3
Thumbs down Fed up, give us our data and goodbye.

(Virtual)Ironically I had to write this post twice because even the Flexiscale forum system failed to save the last one...

This is our last day of tolerating Flexiscales crap platform . There I said it! It's total and utter crap. I don't care if you blame it on the software or the lack of proper redundancies.

We have been loyal customers for 7 months now and in that time we have had 6 complete server blackouts,corruptions, read only file system etc.
We have lost one customer because of these issues and probably will loose another one today because our server has been down now for 5+ hours and we have no clue if it's coming up because the "support" desk is closed and the ticket system is way behind (it's missing 2 tickets...).

Flexiscale has cost us 20-30 times more in damages and waisted time than we have paid in hosting fees to them and some would consider that grounds for a lawsuit considering their SLA. So in effect choosing Flexiscale for our hosting needs has become my worst IT decision.

I expect 4 things from Flexiscale now:
1. Get our server up and running ASAP (id 1317)
2. Refund us our money
3. Appologise
4. Publicly state that the Flexiscale platform is back to alpha/beta status!

This reply is written in total frustration and anger so I may offended someone of the hard working people of Flexiscale so I appologise up front for that but these people don't seem to have a clue of the damage they are doing to other companies like ours by having an SLA that they in no way can guarantee.

Best regards
Eirikur Hrafnsson
Chief Software Engineer
Idega Software
  #8  
Old 3rd November 2008, 22:22
tonylucas tonylucas is offline
FlexiScale CEO
 
Join Date: Jul 2007
Posts: 54
Default

Eirikur,

I understand your server was being looked at, at the time you wrote your post, a reply had gone out from one of our support engineers 20 minutes before this.

Your server is now online as I understand it.

In regards to your points:

1, This has been achieved
2, I'm happy to discuss this, although I don't think it's correct to do this in a public forum.
3, As I've pointed out in my very long post above, I'm genuinely apologetic for the issues that everyone has occured. My staff have worked flat out to solve these problems and I have enormous respect for them.
4, I have already publically stated the current issues with the platform. I will be putting in a post to clarify this further, which will also go on the blog. We have also switched off signups for any new customers until we resolve these problems forever. The one thing we are not doing here is hiding behind an SLA, or putting revenue before customer satisfaction.

I have spent 11 years building this company up, in a competitive market, solely based around quality of service. It is quite clear to everyone that we have failed to deliver on this with FlexiScale of late, and I don't deny that, but we are also clear on what we are doing to resolve this, in the past, present & future. We have absolutely every intention of delivering on this.

Regards,

Tony.

Last edited by tonylucas; 3rd November 2008 at 22:24. Reason: typo
  #9  
Old 3rd November 2008, 23:10
idega idega is offline
Junior Member
 
Join Date: Nov 2008
Posts: 3
Default So long and thanks for all the fish (better?)

Hi Tony,
and thank you for one quick reply today. I do agree with other posts here that you are upfront about the problems even though your frontpage still says "99.99% SLA", the network status page gets wiped cleaned pretty often and you haven't exactly been discounting the hosting since you realised your platform doesn't deliver what it promises.

Our server is indeed online now after "only" 6-7 hours of downtime and now I have the task of checking if the data corrupt.

I myself started and still manage a small software company for 9 years now and in fact that is the only reason we stayed so long and through all the disapointments at Flexiscale. We truly wanted and STILL want Flexiscale to succeed. HOWEVER I'm sorry to say our business cannot afford to depend on Flexiscale and therefor we will be moving our stuff elsewhere.

I can say this about your staff, they are polite and clearly hard working (like I already stated). However on more than one occasion like today I was told over and over again that the "developers" would call me and the last couple of times I called it was made clear that this was to happen before the support phone closed. Then came hours of silence which pushed me to write my post. I didn't want to do this "publicly" like you say but don't see that I had a choice since I wasn't getting any responses.

-Eiki
  #10  
Old 3rd November 2008, 23:35
tonylucas tonylucas is offline
FlexiScale CEO
 
Join Date: Jul 2007
Posts: 54
Default

Eiki,

Thanks for your reply, we are working hard on improving the communication and feedback to our customers, obviously we're still making failures in places, but I will have this investigated tomorrow and get back to you.

In regards to the status page, it's not wiped clean to hide things, merely the current system only allows a number of posts to be displayed, so it naturally rotates older ones out. It is on our list of things to completely redo in a different way though. In regards to discounting the hosting, at the end of the day we are a company in this to make a profit (one day at least!) so we have to keep pricing it sensibly, otherwise we end up in a downward spiral.

I respect your opinion regarding currently using the platform, and would be happy to work with you to see if we could offer you a dedicated server option in the mean time, until I can hopefully convince you of the stability of FlexiScale when we roll our new software out.

Regards,

Tony.
Closed Thread

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:23.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.