Portable Stream and Batch Processing with Apache Beam

Featuring speakers from:

    

 

Stream processing is increasingly relevant in today’s world of big data, thanks to the lower latency, higher-value results, and more predictable resource utilization afforded by stream processing engines. At the same time, without a solid understanding of the necessary building blocks, streaming can feel like a complex and subtle beast. It doesn’t have to be that way.

Join Google Open Source team for a tour of stream processing concepts via a walkthrough of the easiest to use yet most sophisticated stream processing model on the planet, Apache Beam. You’ll explore a series of examples that help shed light on the important topics of windowing, watermarks, and triggers; observe firsthand the different shapes of materialized output made possible by the flexibility of the Beam streaming model; experience the portability afforded by Beam, as you work through examples using the runner of your choice (Apache Flink, Apache Spark, or Google Cloud Dataflow); and interact with engineers who have years of experience with massive-scale stream processing.

Requirements

  •  Laptop.
  • GitHub account.
  • Any initial setup for the Beam execution engine of your choice (Flink, Spark, or Cloud Dataflow) already completed.
  • Familiarity with the high-level concepts of distributed data processing, and familiarity with Flink, Spark, or Cloud Dataflow would be a plus.
Feb 10

Course Dates

Mexico City, MX

Location

6 hrs

Course Length

30 students

Onsite limited

English

Language

LecturersLearn from Google's Apache Beam experts

Pablo Estrada

Pablo Estrada

Software Engineer at Google

Pablo is a Software Engineer from Mexico City. He lives in Seattle and works trying to make Google Cloud Dataflow the best runner for Beam. He's worked all across the stack, mostly in Python and Java. His favorite activities are traveling, and getting drunk with the locals.

Mariann Nagy

Mariann Nagy

User Experience Researcher

Originally from Hungary, Mariann joined Google on a dare in 2012, she was part of the Global Business Organization until she did a ladder transfer into UX Research as the realm of UX seemed like a better fit for her. She was first given the opportunity to work on gHire, Google's internal hiring system for both employees who interview candidates, and Recruiters who constantly look for new talent. In 2016 she moved to Seattle, WA after transitioning into Cloud's Data Analytics team.
Mariann currently concentrates her efforts on improving Google Cloud's Big Data offerings. She works on BigQuery and Dataflow and also spends some time on Machine Learning as it naturally comes up during her user interviews and research studies.

Schedule

9:30-10:00 AM

Arrival, networking, environment setup

10:00 - 11:00 AM

Introduction to streaming concepts and Apache Beam

11:00 - 11:30 PM

Case study: developing a data processing pipeline for a mobile game

11:30 - 1:30 PM

Excercises

1:30 - 2:00 PM

Lunch!

2:00 - 3:00 PM

Unified batch and stream processing model in Apache Beam

3:00 - 5:00 PM

Exercises, LeaderBoard and GameStats

Want to stay in the loop?

Sign up to receive notifications about upcoming Wizeline Academy courses

Interested in sharing your expertise at Wizeline Academy? Send us an email academy@wizeline.com