Portable Stream and Batch Processing with Apache Beam

Featuring speakers from:

    

 

Stream processing is increasingly relevant in today’s world of big data, thanks to the lower latency, higher-value results, and more predictable resource utilization afforded by stream processing engines. At the same time, without a solid understanding of the necessary building blocks, streaming can feel like a complex and subtle beast. It doesn’t have to be that way.

Join Davor Bonaci for a tour of stream processing concepts via a walkthrough of the easiest to use yet most sophisticated stream processing model on the planet, Apache Beam. You’ll explore a series of examples that help shed light on the important topics of windowing, watermarks, and triggers; observe firsthand the different shapes of materialized output made possible by the flexibility of the Beam streaming model; experience the portability afforded by Beam, as you work through examples using the runner of your choice (Apache Flink, Apache Spark, or Google Cloud Dataflow); and interact with engineers who have years of experience with massive-scale stream processing.

Requirements

  •  Laptop.
  • GitHub account.
  • Any initial setup for the Beam execution engine of your choice (Flink, Spark, or Cloud Dataflow) already completed.
  • Familiarity with the high-level concepts of distributed data processing, and familiarity with Flink, Spark, or Cloud Dataflow would be a plus.
December 2

Course Dates

Guadalajara, MX

Location

1 day

Course Length

40 students

Onsite limited

English

Language

Learn with the Best

Davor Bonaci

Davor Bonaci

Apache Beam PMC Chair

Davor is serving as a chair of the Apache Beam Project Management Committee, and has been regularly committing code to the project since its inception. I'm working as a Senior Software Engineer at Google.
Before Beam, Davor has been working on its predecessor, Google Cloud Dataflow, since its beginnings, most recently by leading the development of the Dataflow SDK for Java.

Griselda Cuevas

Griselda Cuevas

Open Source Program Manager

Gris Cuevas is an Open Source Program Manager at Google Cloud and an aspiring Data Scientist. She currently studies a Masters in Operations Research and Data Science at UC Berkeley. Gris has worked on developing online communities for the past 7 years and is now collaborating on the design of an algorithm to predict author quality in online forums within a research team at Google. Gris likes to solve undefined problems and to spearhead solutions no one has designed before. She's learning to juggle, she loves The Beatles and green tea in all forms.

Pablo

Pablo

Software Engineer at Google

Pablo is a Software Engineer from Mexico City. He lives in Seattle, and works trying to make Google Cloud Dataflow the best runner for Beam. He's worked all across the stack, mostly in Python and Java. His favorite activities are traveling, and getting drunk with the locals.

Schedule

9:30-10:30 AM

Arrival, networking, environment setup

10:30 - 11:30 AM

Introduction to streaming concepts and Apache Beam

11:30 - 12:00 PM

Case study: developing a data processing pipeline for a mobile game

12:00 - 2:00 PM

Excercises

2:00 - 3:00 PM

Lunch!

3:00 - 4:00 PM

Unified batch and stream processing model in Apache Beam

4:00 - 5:00 PM

Exercises, LeaderBoard and GameStats

Want to stay in the loop?

Sign up to receive notifications about upcoming Wizeline Academy courses

Interested in sharing your expertise at Wizeline Academy? Send us an email academy@wizeline.com