Skip to content

In this repository we want to solve 2 problems of Data Base Processing: in batch and in streaming with Spark and we are using Scala like programming language. We are going to simulate a streaming process and we are using a Kafka machine to process the data in streaming.

Notifications You must be signed in to change notification settings

ceblfe/Data_Base_Processing_KC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Base Processing

This repository contains the following files to solve 2 problems of Data Base Processing: in batch and in streaming with Spark and we are using Scala like programming language.

The files in this repository are:

  1. in/
  2. interstelar/
  3. interstelar.pdf
  4. README.md

1. in/

The file in/ contains all data base that we are using .

The data files are:

  1. historico_batch.csv
  2. naves_transporte.csv
  3. trayectos

With these data base we are going to simulate the processes: batch and streaming.

2. interstelar/

This file contains the following files with scala extension:

  • KafkaConsumoMedio.scala
  • KafkaDifConsumo.scala
  • ListaTresMejores.scala
  • MediasConsumosBatch.scala

Every Scala file has a code to explain a part of every problem that we want to solve in the file interstelar.pdf

3. interstelar.pdf

This file contains all the questions that we will use to answer some questions:

Batch processing (to solve these questions we are using Spark SQL):

  • Intake of data stored for years by navigation systems spacecrafts and docking ports (mode batch): we answer this question with the file MediasConsumosBatch.scala
  • Data cleaning: we answer this question with the file MediasConsumosBatch.scala
  • Calculation of means of consumption of all the spacecrafts of the fleet grouped by spacecraft (every spacecraft has an identifier):we answer this question with the file MediasConsumosBatch.scala

Streaming process (to solve these questions we are using Spark Streamining and a Kafka machine):

  • Real time consumption data (Spark Streaming): we answer this question with the file KafkaConsumoMedio.scala
  • Calculation of means of consumption of all the spacecrafts of the fleet grouped by spacecraft (every spacecraft has an identifier) obtained in real time: we answer this question with the file KafkaConsumoMedio.scala
  • Process on both datasets obtaining the difference between average consumption: we answer this question with the file KafkaDifConsumo.scala
  • Obtaining a collection (List) of tuple elements (identification spacecraft and model) with the three best transports: we answer this question with the file ListaTresMejores.scala

About

In this repository we want to solve 2 problems of Data Base Processing: in batch and in streaming with Spark and we are using Scala like programming language. We are going to simulate a streaming process and we are using a Kafka machine to process the data in streaming.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages