The userclustering from dekkard

userclustering's Introduction

Paper: Towards Users Clustering by Analyzing Web Application Log Files through the Utilization of Spark

Abstract: Nowadays, many data mining algorithms should deal with the huge data. On basis of distributed computing, a method of clustering users by analyzing huge numbers of web application log files is proposed, the proposed method is integrated into the semantic accessing information. The process includes data pretreatment, data cleaning or merging algorithm. It mines out web application log's user accessing time, click times and preferred accessing content, etc. It scales with batch processing ability over standalone tools and inmemory computing capacity for log analysis. With utilizing Spark, the program for dealing web application log file data is developed. Besides, it proves Spark's excellent performance in data dealing, and validates the method's efficiency and practicability. Experimental results show that, in Spark cluster computing environment, the method deals with huge numbers of log files effectively, and improves the efficiency of data mining obviously.

Recommend Projects

dekkard / userclustering Goto Github PK

userclustering's Introduction

userclustering's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs