Data Engineering with Spark

This is a multi-week course to get existing Software Engineers comfortable with Spark. The topics include the fundamental concepts for large-scale processing and the most important transformations. As a hands-on course, the exercises enable a seasoned programmer to produce scripts for data pipelines: from reading to transforming to writing.

Aug 13 - Sept 6

Course Dates

Guadalajara, MX

Location

4 Weeks

Course Length

30 students

Onsite limited

English

Language

Who should apply

$positionIcon

Software engineers with a mid to high level on their development skills

$positionIcon

Engineers proficient in Python or Scala

$positionIcon

Seeking a transition from software engineers to data engineers

$positionIcon

Independent learners

Lecturers

Carlos Zubieta

Carlos Zubieta

Data Engineer

Carlos Zubieta is currently pursuing his Masters in Computer Science degree at CIMAT. In his past job, he worked at HP Labs for two years, where he was involved with optimization of Graph-Analysis algorithms to work in large Non-Volatile Memory Systems for Graphs with tens of billions of nodes (internet-scale).

Ricardo Magaña

Ricardo Magaña

Data Engineer

Ricardo started with data technologies while analyzing data from proton-antiproton collisions at Fermilab. Ricardo earned his PhD in High Energy Physics from Cinvestav. Prior to Wizeline, he created data infrastructure at Kueski and Hewlett-Packard. At Wizeline he works as Data Engineer creating pipelines and transforming data for media clients. His interests are in Distributed Data Processing, Machine Learning, Algorithms and HPC.

Abraham Alcantara

Abraham Alcantara

Data Engineer

Abraham did his studies on Physics and Mathematics at IPN and has 10 years of experience in software development including two publications in international journals. Before joining Wizeline he was working on real-time and batch processing big data pipelines at Ooyala. He has professional working experience on topics such as: federated database engines, semantic technologies, language translators, cloud application engines and auto-generated APIs.

Schedule

Spark Basics

Hello World

Architecture

Map-Reduce theory

Fault tolerance in spark

Operations

Reduction operations

Working with key-value pairs: Transformations & Actions

Partitioning and Shuffling

Partitioning

Shuffling

Joins

Want to stay in the loop?

Sign up to receive notifications about upcoming Wizeline Academy courses

Interested in sharing your expertise at Wizeline Academy? Send us an email academy@wizeline.com