GithubHelp home page GithubHelp logo

raghavm23 / data-warehouse-business-intelligence Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 874 KB

In this project, we analyze the past criminal record data of Ireland generated by the Gardaí. We attempt to map this data with some key metrics such as population of different divisions, unemployment rates and depression levels to analyze the relation between crime types and criminal intense areas.

R 100.00%
r visual-studio sentimental-analysis tableau kimball

data-warehouse-business-intelligence's Introduction

Data Warehouse & Business Intelligence

The objective behind this data warehouse is to be able to derive business intelligence for the Gardaí and provide insights in order to regulate and reduce the occurrence of crimes in the country.

Dataset

Data was collated from multiple sources -

  1. Gardaí offences data – This data is a structured set of data and our primary source available on the www.data.gov.ie website. This data includes the number of offences for each type of crime, in all the Gardaí stations across the country. Each station is mapped to one of the 26 divisions of Republic of Ireland. Data is downloaded from the website in CSV format. URL: https://data.gov.ie/dataset/crimes-at-garda-stations-level-2010-2016

  2. Wikipedia data – Our second set of data is a semi-structured dataset extracted from a Wikipedia page. This data consists of the population and province for each of the divisions of Ireland. We have extracted this dataset using the R and have subsequently cleansed the data in the same code. Some unwanted rows related to Northern Ireland and some unrequired columns have been removed. URL: https://en.wikipedia.org/wiki/List_of_Irish_counties_by_population

  3. Twitter Sentiment data – Have also extracted tweets to run a sentimental analysis based on certain keywords such as “extortion”, “assault”, “theft”, “kidnapping”, “violence”, “murder”. Each time we have entered the name of the division to generate tweets related to that area, performed a sentimental analysis and saved the score in a different data frame. This data frame consisting of the sentiment score for all 26 divisions have been written out to a CSV from the R code.

  4. Mock data - To aid our analysis we have mocked data which was unavailable with the Irish CSO and other Irish data websites. We have generated an unemployment rate (in %) and depression level (score of 0-100) for each of the 26 divisions for 4 years. This is required to do comparative analysis on the occurrence of different crime types and whether any relation exists. This data has been downloaded and structured in CSV format.

Architecture

The Kimball approach uses bottom-up approach and implements the generation of multi-dimensional tables around a central fact table. This data will be extracted and transformed to be housed in a data mart. This method advocates denormalization of data across the warehouse For our project, we have used the Kimball approach, where data from multiple sources are housed in the staging area, undergo ETL process and used to build the data warehouse. Then we finally use OLAP to derive reports and analysis for our end client.

ETL

Our ETL process is carried out in Integration Services in Visual Studio. We first truncate our raw data tables to remove all values and load the data on to it. This is done so that if there is a change in the source data, it will reflect in our data warehouse. This ETL process is re-runnable and is executed each time we wish to generate our fact and dimension tables.

After the data is loaded in to the raw tables in the staging area, we generate our dimension and fact tables. We start off by truncating our dimension tables, before we load data from the raw tables on to it. After our dimension tables, the fact table is created and the measures as well as primary keys loaded on to it. The creation of fact and dimension tables are achieved through running EXECUTE SQL TASK and using SQL JOIN queries to merge the data from the raw tables as per our business requirements.

Cube Deployment and BI Queries

After our Fact and Dimension tables are created, we move to the Analysis Services of Visual Studio to deploy our cube. Our dimensional model of star schema is generated over here.

Once cube is generated, we visualize our data to express 3 non trivial BI queries in Tableau.

Video Link

See the demonstration

data-warehouse-business-intelligence's People

Contributors

raghavm23 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.