GithubHelp home page GithubHelp logo

anjijava16 / spark-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from spider-123-eng/spark

0.0 2.0 0.0 3.96 MB

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

Home Page: https://github.com/Re1tReddy/Spark

Scala 100.00%

spark-1's Introduction

Spark

Apache Spark is a fast, in-memory data processing engine with elegant and expressive development APIs to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets. With Spark running on Apache Hadoop YARN, developers everywhere can now create applications to exploit Spark’s power, derive insights, and enrich their data science workloads within a single, shared dataset in Hadoop.

This project contains programs for Spark in Scala launguage .

Topics Covered in Spark 2.1

Implementing custom UDF,UDAF,Partitioner using Spark-2.1
Working with DataFrames (ComplexSchema,DropDuplicates,DatasetConversion,GroupingAndAggregation)
Working with DataSets
Working with Parquet files
Partitioning the data by a specific column and store it partition wise
Loading Data from Cassnadra table using Spark
Working with Spark Catalog API to access Hive tables
Inserting data in to Hive table (Managed,External) from Spark
Inserting data in to Hive Partitioned table as Parquet format (Managed,External) from Spark
Adding,Listing Partitions to Hive table using Spark
CRUD operations on Cassandra Using Spark
Reading/Writing to S3 buckets Using Spark
Spark MangoDB Integration

Pushing Spark Accumulator Values as metrics to DataDog API

Topics Covered in Spark 1.5

Spark Transformations.
Spark To Cassandra connection and storage.
Spark To Cassandra CRUD operations.
Reading data from Cassandra using spark streaming(Cassandra as source).
Spark Kafka Integration.
Spark Streaming with Kafka.
Storing the Spark Streaming data in to HDFS.
Storing the Spark Streaming data in to Cassandra.
Spark DataFrames API (Joining 2 data frames,sorting,wild card search,orderBy,Aggregations).
Spark SQL.
Spark Hive Context (Loading ORC,txt,parquet data from Hive table ).
Kafka Producer.
Kafka Consumer by Spark integration with Kafka.
Spark File Streaming.
Spark Socket Streaming.
Spark JDBC Connection.
Scala Case Class limitations overcoming by using Struct Type.
Working with CSV,Json,XML,ORC,Parquet data files in Spark.
Working with Avro,SequenceFiles in Spark.
Spark Joins.
Spark Window vs Sliding Interval.
Spark Aggregations using DataFrame API.
Writing a Custom UDF,UDAF in Spark.
Storing data as text,parquet file in to HDFS.
Integrating Spark with Mangodb.


You can reach me for any suggestions/clarifications on : [email protected]
Feel free to share any insights or constructive criticism. Cheers!!
#Happy Sparking!!!..

spark-1's People

Watchers

Anjaiah Methuku avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.