Course Outline
Introduction
- Overview of Spark and Hadoop features and architecture
- Understanding big data
- Python programming basics
Getting Started
- Setting up Python, Spark, and Hadoop
- Understanding data structures in Python
- Understanding PySpark API
- Understanding HDFS and MapReduce
Integrating Spark and Hadoop with Python
- Implementing Spark RDD in Python
- Processing data using MapReduce
- Creating distributed datasets in HDFS
Machine Learning with Spark MLlib
Processing Big Data with Spark Streaming
Working with Recommender Systems
Working with Kafka, Sqoop, Kafka, and Flume
Apache Mahout with Spark and Hadoop
Troubleshooting
Summary and Next Steps
Requirements
- Experience with Spark and Hadoop
- Python programming experience
Audience
- Data scientists
- Developers
Testimonials (3)
The fact that we were able to take with us most of the information/course/presentation/exercises done, so that we can look over them and perhaps redo what we didint understand first time or improve what we already did.
Raul Mihail Rat - Accenture Industrial SS
Course - Python, Spark, and Hadoop for Big Data
I liked that it managed to lay the foundations of the topic and go to some quite advanced exercises. Also provided easy ways to write/test the code.
Ionut Goga - Accenture Industrial SS
Course - Python, Spark, and Hadoop for Big Data
The live examples