Niagara Data Platform

Niagara is a Fast & Big Data Processing, Machine Learning, and Data-as-a-Service platform, implemented in Scala with SDACK stack. It is built on complicated public data sets to evaluate emerging Stateful Stream Processing to build lightweight Streaming Services.

SDACK Tech Stack

  • The batch analytic engine: Spark (Spark Streaming, SQL, MLlib)

  • The lightweight container: Docker (Kubernetes)

  • The real-time view: Akka (Akka Streams, Http, Alpakka)

  • The scalable storage: Cassandra

  • The distributed message broker: Kafka (Kafka Streams, Connects, Schema Registry)

Dataset

  • The Yelp Dataset contains 4.1 million reviews(3.5GB) by 1 million users(1.2GB) for 144K businesses(115MB). https://www.yelp.ca/dataset_challenge

  • The Stack Exchange Dataset contains 28 million Posts in a 40GB single XML file. https://archive.org/details/stackexchange

Modules

  • Data Streaming (Kafka, Spark, Akka)

  • CQRS & Event Sourcing

  • Machine Learning