Niagara Data Platform
Niagara is a Fast & Big Data Processing, Machine Learning, and Data-as-a-Service platform, implemented in Scala with SDACK stack. It is built on complicated public data sets to evaluate emerging Stateful Stream Processing to build lightweight Streaming Services.
SDACK Tech Stack
-
The batch analytic engine: Spark (Spark Streaming, SQL, MLlib)
-
The lightweight container: Docker (Kubernetes)
-
The real-time view: Akka (Akka Streams, Http, Alpakka)
-
The scalable storage: Cassandra
-
The distributed message broker: Kafka (Kafka Streams, Connects, Schema Registry)
Dataset
-
The Yelp Dataset contains 4.1 million reviews(3.5GB) by 1 million users(1.2GB) for 144K businesses(115MB). https://www.yelp.ca/dataset_challenge
-
The Stack Exchange Dataset contains 28 million Posts in a 40GB single XML file. https://archive.org/details/stackexchange
Modules
-
Data Streaming (Kafka, Spark, Akka)
-
CQRS & Event Sourcing
-
Machine Learning