Comments (6)
Hi @tunglambk,
Sure, its my pleasure!
Such "ground truth" of HDFS log messages are downloaded from the homepage of Prof. Wei Xu. And we manually moved it into the python project.
According to our experiences with log parsers, I think it is safe to say that those were manually written or extracted from the source code of the target system. Log parsers are highly likely to introduce noises.
Hope this will help, if anything, feel free to comment and I am happy to answer :P
from plelog.
Hi Dr.@YangLin-George ,
Thank you so much for your enthusiastic support.
From your explanation, I understand that if we can extract the log templates from the source code, we should use them like the parse_by_Official() function to avoid noise from the log parser. Otherwise, we can use IBM Drain3 to parse logs like the parse_by_IBM() function. Is this right?
from plelog.
In my opinion, while doing research, log parsing seems necessary. For example, the widely used BGL dataset does not have such a "ground truth" like HDFS.
However, while you are using an anomaly detection approach in industry, if the real templates are available, its better to use them directly.
from plelog.
Hi Dr. @YangLin-George,
Yeah, I understood. Besides, I have one more question.
For example, the widely used BGL dataset does not have such a "ground truth" like HDFS.
The parse_by_Official() function of the BGLLoader class also has templates like HDFSLoader. How do you extract it? Thank you so much.
from plelog.
Those were written based on the tempaletes generated by Drain. We tried to study the impact of parsing and prepared those.
from plelog.
Thank you so much for your support and great work, Dr. @YangLin-George, I close this issue here.
from plelog.
Related Issues (20)
- Hello, Dr.Yang. I don't know how to download the HDFS dataset. HOT 1
- Error loading HDFS dataset. HOT 2
- Why using FastICA for reducing dimensionality? HOT 2
- How to improve accuracy? HOT 6
- Seems a package conflict occurs HOT 2
- How to select hyperparameter HOT 5
- Some problems with the requirements.txt HOT 1
- Confusion about the probability labeling in Common.py HOT 1
- error when train HOT 2
- dependency conflict HOT 1
- Hello, Dr.Yang. I encountered some problems when running your code. HOT 2
- running PLElog.py problem HOT 10
- PLElog.py running problem(numpy version) HOT 3
- Training process error HOT 17
- Error running pipeline.py HOT 4
- How to reproduce the result of other methods? HOT 2
- Run Drain.py to extract templates parameter Settings HOT 2
- About Step 4: Download Stanford NLP word embeddings HOT 4
- Not found pipeline.py file HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from plelog.