Apache Spark is a general computing engine designed for large-scale data processing. It is becoming ever more popular thanks to the support from the Apache community. Many well-known companies use it to process petabytes of data on 8000+ nodes with long running jobs measured in weeks.
In this webinar, you will learn about:
- Apache Spark and how it relates to (traditional) Hadoop MapReduce technology,
- What makes Spark so fast
- How to use its rich API’s to design and run your ETL jobs.
- Apache Spark streaming capabilities for near real-time updates and its role in Big Data processing scenarios.
- Structured Streaming, a scalable and fault tolerant stream processing engine which makes near real-time processing scenarios even easier.