Amazon Web Services Planning Real-Time, Big Data Stream Processing Service

Amazon Web Services logo

Amazon Web Services logo According to a job ad posted by Amazon Web Services, the company is planning to launch a new service for processing streams of big data. Here’s the job listing:

Amazon Web Services is looking for a Senior Development Manager to lead the team that is building a disruptive new service for processing Big Data streams. Our globally distributed service must be able to process over 2 million records per second at launch, and eventually scale to handle over 100x that traffic. It is equally critical that our platform provide highly available, highly reliable processing of data in near-realtime. If you’re a distributed systems guru who thinks about data in terms of exabytes, this is your dream job.

We are seeking a seasoned manager and technical thought leader with a strong track record of owning successful products. You’ll build and manage a strong team in a fast-paced, startup-like environment. As a technical leader, you’ll need to be a pragmatic visionary that can translate business needs into workable technology solutions that scale both technologically and operationally. Given our rapid growth, you’ll need to be able to lead the organization through change, evolution, and sustained growth as well as be able and willing to roll up your sleeves and get hands on when necessary.
Basic Qualifications
* 5+ years experience building and managing high performance software teams
* 5+ years of OO software development experience in C++ or Java (preferably both)
* Bachelor’s or Master’s degree in Computer Science or a related discipline
* Experience building and operating extremely high volume and highly scalable web services
* Experience building and operating highly available, 24×7 services

Preferred Qualifications
* Experience building and operating globally distributed systems
* Experience with Hadoop, MapReduce, or other Big Data processing platforms

It’s not entirely clear based on this whether AWS is building its own stream processing software, or if it will be offering a service based on existing software (such as Storm, Apache S4, HStreaming, Streambase etc). But it sounds based on this listing like AWS is building its own processing system based on Hadoop.

There have been a few academic projects to implement stream computing on AWS (see here and here for example), and HStreaming can be implemented on AWS as well.

Real-time data processing was a hot topic at Hadoop World. The session on real-time analytics with Hadoop overflowed.