duoergun0729 / 1book Goto Github PK

View Code? Open in Web Editor NEW

887.0 887.0 437.0 600.31 MB

《Web安全之机器学习入门》

Home Page: http://www.douwaf.com

Python 11.64% PHP 87.96% Hack 0.40%

1book's People

Contributors

Stargazers

Watchers

Forkers

miracle963 emersonxuelinux skskevin 7a6163 njzy 0xqq yuyuqi chenygie sherwel ezy2015 verazuo haerwang delcoding xingkong123600 csyjyy hunterhawk beike2020 movinghera b01u wangxiuwen-fork lxghost kerasking erik-ly 5up3rc yyqs2008 raul1718 bsmali4 cn-leowong zhouyesheng djoffrey wujiechao jingqiangliu fengyufengzi equalll wanchouchou godning teng2015 littlehann jnjt115 showsmall yingkangfang qqlover long80a iamlile gncao bajief sampr0 samueltt xianghaohyman vianer 623665910 makaisghr yeying213 tattocau r00tak 2016sun daydayup40 yyg547436708 hsusage leexuan moonwind19 snrna lonehand kylinzhu chushuai youmuyou rainbowhyw wsfengfan andyhao567 wuzeen chocoyvan bloodzer0 weaponsandtools jiamingwudutir stevetony avaudioplayer zqlalala zhengkaili oliviawqyi techeye220 dongshen mryoungyu wyatu gu5hanl1gh7n1n cuijinyu cherishao hecbi anna-yj kamisec yaoyaos blueroutecn saulty4ish yoyomael ivice temulun wstart siwufeiwu sweird1 buctdarkness kutim

1book's Issues

HMM识别XSS攻击（1）

课本P151 中的发射概率矩阵表12-2的发射概率矩阵示例，是否有问题（均为隐藏状态S），代码12-2.py，调用apriori算法，计算了置信度，并没有进行HMM算法求参数矩阵，

Load file(../data/ADFA-LD/Training_Data_Master/UTD-0796.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0797.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0798.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0799.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0800.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0801.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0802.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0803.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0804.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0805.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0806.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0807.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0808.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0809.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0810.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0811.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0812.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0813.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0814.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0815.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0816.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0817.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0818.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0819.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0820.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0821.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0822.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0823.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0824.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0825.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0826.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0827.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0828.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0829.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0830.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0831.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0832.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0833.txt)
Traceback (most recent call last):
File "D:\Users\Administrator\eclipsFaySon\secuit\code\8-2.py", line 76, in
score=model_selection.cross_val_score(logreg, x, y, n_jobs=-1, cv=10)
File "D:\ProgramData\Anaconda2\lib\site-packages\sklearn\model_selection_validation.py", line 342, in cross_val_score
pre_dispatch=pre_dispatch)
File "D:\ProgramData\Anaconda2\lib\site-packages\sklearn\model_selection_validation.py", line 206, in cross_validate
for train, test in cv.split(X, y, groups))
File "D:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 789, in call
self.retrieve()
File "D:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 740, in retrieve
raise exception
sklearn.externals.joblib.my_exceptions.JoblibValueError: JoblibValueError

Multiprocessing exception:
...........................................................................
D:\Users\Administrator\eclipsFaySon\secuit\code\8-2.py in ()
71 solver='sgd', verbose=10, tol=1e-4, random_state=1,
72 learning_rate_init=.1)
73
74 logreg = linear_model.LogisticRegression(C=1e5)
75
---> 76 score=model_selection.cross_val_score(logreg, x, y, n_jobs=-1, cv=10)
77 print np.mean(score)
78
79
80

...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\model_selection_validation.py in cross_validate(estimator=LogisticRegression(C=100000.0, class_weight=None...linear', tol=0.0001, verbose=0, warm_start=False), X=array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0,...0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64), y=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], groups=None, scoring={'score': }, cv=StratifiedKFold(n_splits=10, random_state=None, shuffle=False), n_jobs=-1, verbose=0, fit_params=None, pre_dispatch='2*n_jobs', return_train_score=False)
201 scores = parallel(
202 delayed(_fit_and_score)(
203 clone(estimator), X, y, scorers, train, test, verbose, None,
204 fit_params, return_train_score=return_train_score,
205 return_times=True)
--> 206 for train, test in cv.split(X, y, groups))
cv.split = <bound method StratifiedKFold.split of Stratifie...d(n_splits=10, random_state=None, shuffle=False)>
X = array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0,...0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64)
y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
groups = None
207
208 if return_train_score:
209 train_scores, test_scores, fit_times, score_times = zip(*scores)
210 train_scores = _aggregate_score_dicts(train_scores)

...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py in call(self=Parallel(n_jobs=-1), iterable=<generator object >)
784 if pre_dispatch == "all" or n_jobs == 1:
785 # The iterable was consumed all at once by the above for loop.
786 # No need to wait for async callbacks to trigger to
787 # consumption.
788 self._iterating = False
--> 789 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
790 # Make sure that we get a last message telling us we are done
791 elapsed_time = time.time() - self._start_time
792 self._print('Done %3i out of %3i | elapsed: %s finished',
793 (len(self._output), len(self._output),

Sub-process traceback:

ValueError Wed Apr 3 08:53:56 2019
PID: 16256 Python 2.7.15: D:\ProgramData\Anaconda2\python.exe
...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py in call(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self._size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
func =
args = (LogisticRegression(C=100000.0, class_weight=None...linear', tol=0.0001, verbose=0, warm_start=False), array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0,...0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64), [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], {'score': }, array([ 84, 85, 86, 87, 88, 89, 90, 91, ... 826, 827, 828, 829, 830, 831, 832], dtype=int64), array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1..., 77, 78, 79, 80, 81, 82, 83],
dtype=int64), 0, None, None)
kwargs = {'return_times': True, 'return_train_score': False}
self.items = [(, (LogisticRegression(C=100000.0, class_weight=None...linear', tol=0.0001, verbose=0, warm_start=False), array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0,...0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64), [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], {'score': }, array([ 84, 85, 86, 87, 88, 89, 90, 91, ... 826, 827, 828, 829, 830, 831, 832], dtype=int64), array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1..., 77, 78, 79, 80, 81, 82, 83],
dtype=int64), 0, None, None), {'return_times': True, 'return_train_score': False})]
132
133 def len(self):
134 return self._size
135

...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\model_selection_validation.py in _fit_and_score(estimator=LogisticRegression(C=100000.0, class_weight=None...linear', tol=0.0001, verbose=0, warm_start=False), X=array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0,...0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64), y=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], scorer={'score': }, train=array([ 84, 85, 86, 87, 88, 89, 90, 91, ... 826, 827, 828, 829, 830, 831, 832], dtype=int64), test=array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1..., 77, 78, 79, 80, 81, 82, 83],
dtype=int64), verbose=0, parameters=None, fit_params={}, return_train_score=False, return_parameters=False, return_n_test_samples=False, return_times=True, error_score='raise')
453
454 try:
455 if y_train is None:
456 estimator.fit(X_train, **fit_params)
457 else:
--> 458 estimator.fit(X_train, y_train, **fit_params)
estimator.fit = <bound method LogisticRegression.fit of Logistic...inear', tol=0.0001, verbose=0, warm_start=False)>
X_train = array([[ 0, 0, 0, ..., 0, 0, 0],
... 0, 0, 0, ..., 0, 0, 0]], dtype=int64)
y_train = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
fit_params = {}
459
460 except Exception as e:
461 # Note fit time as time until error
462 fit_time = time.time() - start_time

...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\linear_model\logistic.py in fit(self=LogisticRegression(C=100000.0, class_weight=None...linear', tol=0.0001, verbose=0, warm_start=False), X=array([[ 0., 0., 0., ..., 0., 0., 0.]... [ 0., 0., 0., ..., 0., 0., 0.]]), y=array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0]), sample_weight=None)
1232 " = {}.".format(self.n_jobs))
1233 self.coef_, self.intercept_, n_iter_ = fit_liblinear(
1234 X, y, self.C, self.fit_intercept, self.intercept_scaling,
1235 self.class_weight, self.penalty, self.dual, self.verbose,
1236 self.max_iter, self.tol, self.random_state,
-> 1237 sample_weight=sample_weight)
sample_weight = None
1238 self.n_iter = np.array([n_iter_])
1239 return self
1240
1241 if self.solver in ['sag', 'saga']:

...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\svm\base.py in fit_liblinear(X=array([[ 0., 0., 0., ..., 0., 0., 0.]... [ 0., 0., 0., ..., 0., 0., 0.]]), y=array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0]), C=100000.0, fit_intercept=True, intercept_scaling=1, class_weight=None, penalty='l2', dual=False, verbose=0, max_iter=100, tol=0.0001, random_state=None, multi_class='ovr', loss='logistic_regression', epsilon=0.1, sample_weight=None)
848 y_ind = enc.fit_transform(y)
849 classes = enc.classes_
850 if len(classes_) < 2:
851 raise ValueError("This solver needs samples of at least 2 classes"
852 " in the data, but the data contains only one"
--> 853 " class: %r" % classes_[0])
classes_ = array([0])
854
855 class_weight_ = compute_class_weight(class_weight, classes_, y)
856 else:
857 class_weight_ = np.empty(0, dtype=np.float64)

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0

第14章神经网络算法识别MNIST手写数字案例中数据集不能在线下载

import matplotlib.pyplot as plt
from sklearn.datasets import fetch_mldata
from sklearn.neural_network import MLPClassifier

mnist = fetch_mldata("MNIST original")

rescale the data, use the traditional train/test split

X, y = mnist.data / 255., mnist.target
X_train, X_test = X[:60000], X[60000:]
y_train, y_test = y[:60000], y[60000:]

HTTPError: HTTP Error 404: Dataset 'mnist-original' not found on mldata.org.

使用K近邻算法检测异常操作（一）的标识文件在哪里获取？

只有50个用户的命令操作文件，标识文件在Schonlau的个人网站： http://www.schonlau.net/上面没有找到。

对于 7-3.py 中的代码有一处疑惑

line 37:
webshell_bigram_vectorizer = CountVectorizer(ngram_range=(1, 1), decode_error="ignore",
token_pattern = r_token_pattern,min_df=1)

line 45:
wp_bigram_vectorizer = CountVectorizer(ngram_range=(2, 2), decode_error="ignore",
token_pattern = r_token_pattern,min_df=1,vocabulary=vocabulary)

在对黑样本的特征化过程中，用的是1-gram，提取出的 vocabulary 也是 1-gram

但是在白样本的特征化中，使用的仍然是 2-gram，提取出的字符串不可能出现在 vocabulary 中啊..
白样本的特征向量全都为 [0 0 0 ... 0 0 0]

关于code部分（5-4.py）有点问题咨询

Warning (from warnings module):
File "C:\Users\lenvov\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\model_selection_split.py", line 665
warnings.warn(("The least populated class in y has only %d"
UserWarning: The least populated class in y has only 2 members, which is less than n_splits=10.
[1. 1. 1. 1. 1. 0.77777778

```
    1.         0.88888889 0.88888889]
```

5-4.py运行结果如上面所示，10折交叉验证存在问题，与样本有关

requirements.txt better

requirements.txt better.
thx a lot :)

与书上的结果不同

兜哥您好，本代码当N=100时python3.6运行出来的结果是100，而您书上给出的是83.3333。当我把N改成90时，结果就是83.333了。对于这问题希望您解释一下，是否是代码里有些问题呢？

统计最频繁使用的前50个命令和最不频繁50个命令对应的代码疑问

兜哥，你好

WEB安全机器学习入门P78里面有个“统计最频繁使用的前50个命令和最不频繁50个命令”对应的代码如下：

fdist = FreqDist(dist).keys()
dist_max=set(fdist[0:50])
dist_min = set(fdist[-50:])

我看了官网对FreqDist的解释，该函数功能是对词频进行统计，例如这里如果使用临时变量tmp进行遍历即可得到如下结果：

tmp  = FreqDist(dist)
for key in tmp:
    print key+" : "+str(tmp[key])

gs : 4
tset : 1
basename : 616
uname : 443
touch : 3
... ...
所以这里已经是去重后的数据，但使用fdist[0:50]和fdist[-50:]并不能得到最频繁使用的前50个命令和最不频繁50个命令，比如fdist[0:10]得到的就是字典前10条。

求指导！谢谢了！

where is label.txt in Chapter 5? Forgot to upload it?

2017年11月第一版第三次印刷的书从第126直接跳到143了

所有包都按书里说的安装成功，但是运行还是有错误。

大神，有关于这本书的微信读者群吗？

里面的样本有很多不懂，可以请教一下吗？建个群之类的？微信facial58

求助代码

您好，方便给发一份代码吗，下载了三天，实在是下载不下来。
我的邮箱是：[email protected]。
很感谢您能公布代码

P132中倒数第四行勘误

P132，倒数第四行，原文："from sklearn.cluster import K-Means"，纠正为："from sklearn.cluster import KMeans"

新版sklearn弃用sklearn.cross_validation.cross_val_score

很多脚本使用10折交叉验证, sklearn 0.18以上已经弃用, 改用sklearn.model_selection.cross_val_score instead

It is obvious between the original images and cloaked ones

The second one is the cloaked images that I used your algorithm in th == 0.007. Is there really exist the reslut like you say?

勘误

兜哥，你好
我在学习过程中发现的书中的一些问题，如果我的勘误不对的话，还请兜哥原谅

P12 图2-1 主流基于Python的机器学习库横纵坐标单词拼写错误分别为Commits、Contributors
P14 >>>a = array([1, 2, 3]) 应为 a = np.array([1, 2, 3])
P15 >>>c.shape(2,3,2,3) 应为 c.reshape(2,3,2,3)
P37 应修改为
TensorFlow中有类似实现：

from sklearn.feature_extraction.text import CountVectorizer
from tensorflow.contrib import learn
MAX_DOCUMENT_LENGTH=100

样本无法读取

7-6.py中使用mnist.pkl.gz，在python3.6中读取其中的值会报错UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

P201中的x和_y，以及系统的输入输出问题

首先，在这里_y应该改成y_吧？
然后，y_对应的应该是mnist.train.labels（注：mnist.train.labels是一个 [60000, 10] 的数字矩阵，用来表示训练数据集的标签）的第二个维度对应的one-hot编码之后的值，而不是整个系统的输出吧？
而整个系统的输出应该是y吧？

同时x对应的是mnist.train.images的第二个维度对应的值吧？（mnist.train.images 是一个形状为 [60000, 784] 的张量，第一个维度数字用来索引图片，第二个维度数字用来索引每张图片中的像素点。）
而整个系统的输入应该是x和y_共同构成的吧？

书中的一些问题还有一个疑问！！！（2017年11月第1版第3次印刷）

P134，最后一行，原文："plt.show()”缩进有问题，应该与for对齐。
P139，第2行，原文：“那么称这个事件A。为k项集事件A” 应改为 “那么称这个事件A为k项集事件A”
P151，表12-2，发射概率应该是隐藏状态下输出某个观察状态的概率，表12-2应为S1-S4向ACNT的发射概率矩阵。
P153, 第一行，原文“参数异常就输入第三种” 应改为 “参数异常就属于第三种”
P160，第12行，原文： “if domain>=MIN_LEN:” 应改为"if len(domain)>=MIN:"
P160, 倒数第13行，原文： ‘if domain >= MIN_LEN:’ 应改为 “if len(domain)>=MIN:”
P165 , 第1行，原文：”ne04j启动“和第2行，默认密码为ne04j/ne04j，应修改为”neo4j启动“和“neo4j/neo4j”
P185，第3行，原文：“初步怀疑邮箱file1和file1为同一黑产控制的后门文件” 应修改为“初步怀疑file1和file2为同一黑产控制的后门文件”
P190，倒数第5行图14-6，原文： “24X24的图片”，应该为：“28X28的图片”
P201，倒数第13行和第14行，原文“y” 应该为“y”
P203，倒数第10行和第11行，原文“y” 应该为“y”

下面这个问题需要解答一下：
P143，倒数第二行和倒数第一行，原文：“开始节点的值表示这个项集的支持度”，是不是应该说开始节点的值表示的是这个项集的出现次数呢？因为支持度是一个比例值，应该是0-1之间的？倒数第一行，“项集{z}的支持度为5.。。。”应该是说出现次数为5吧？
P144页，第5行，原文：“最小支持度设为3”，支持度是个比例值吗？兜哥求解答，O(∩_∩)O谢谢。

15-3.py与书本上的代码块不符，有没更新的？

兜哥~~15-3.py的代码跟书本上的代码块严重不符啊~~更新一下代码呗

关于1book MasqueradeDat/label.txt一点疑问

你好，最近开始学习您的web安全机器学习一书，请问MasqueradeDat/label.txt这个文件是怎么生成的呢？看到您的代码中应用了这个文件，但是我看原始数据集中并没有这个文件，所以想问一下这个文件是做什么用的，是如何生成的呢？

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

Jobs

Jooble