tobiasodion / datapipeline_trendingmovies Goto Github PK
View Code? Open in Web Editor NEWProject involved the development of a data pipeline using airflow and python. The data pipeline ingested trending movies' and distributors' data from imdb and box office, cleansed, formatted, combined and indexed the data on elastic search. Also, a dashboard was created from the data using kibana analytics. The tools and libraries used in this project included: selenium for data ingestion, pandas for data cleansing and formatting, pyspark for data combimation, elastic search and kibana.