This repository contains the following files to solve 2 problems of Data Base Processing: in batch and in streaming with Spark and we are using Scala like programming language.
The files in this repository are:
- in/
- interstelar/
- interstelar.pdf
- README.md
The file in/ contains all data base that we are using .
The data files are:
- historico_batch.csv
- naves_transporte.csv
- trayectos
With these data base we are going to simulate the processes: batch and streaming.
This file contains the following files with scala extension:
- KafkaConsumoMedio.scala
- KafkaDifConsumo.scala
- ListaTresMejores.scala
- MediasConsumosBatch.scala
Every Scala file has a code to explain a part of every problem that we want to solve in the file interstelar.pdf
This file contains all the questions that we will use to answer some questions:
Batch processing (to solve these questions we are using Spark SQL):
- Intake of data stored for years by navigation systems spacecrafts and docking ports (mode batch): we answer this question with the file MediasConsumosBatch.scala
- Data cleaning: we answer this question with the file MediasConsumosBatch.scala
- Calculation of means of consumption of all the spacecrafts of the fleet grouped by spacecraft (every spacecraft has an identifier):we answer this question with the file MediasConsumosBatch.scala
Streaming process (to solve these questions we are using Spark Streamining and a Kafka machine):
- Real time consumption data (Spark Streaming): we answer this question with the file KafkaConsumoMedio.scala
- Calculation of means of consumption of all the spacecrafts of the fleet grouped by spacecraft (every spacecraft has an identifier) obtained in real time: we answer this question with the file KafkaConsumoMedio.scala
- Process on both datasets obtaining the difference between average consumption: we answer this question with the file KafkaDifConsumo.scala
- Obtaining a collection (List) of tuple elements (identification spacecraft and model) with the three best transports: we answer this question with the file ListaTresMejores.scala