GithubHelp home page GithubHelp logo

1book's People

Contributors

duoergun0729 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

1book's Issues

HMM识别XSS攻击(1)

课本P151 中的发射概率矩阵 表12-2的发射概率矩阵示例,是否有问题(均为隐藏状态S),代码12-2.py,调用apriori算法,计算了置信度,并没有进行HMM算法求参数矩阵,

报错了,在win7上,咋整

Load file(../data/ADFA-LD/Training_Data_Master/UTD-0796.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0797.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0798.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0799.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0800.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0801.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0802.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0803.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0804.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0805.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0806.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0807.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0808.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0809.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0810.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0811.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0812.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0813.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0814.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0815.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0816.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0817.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0818.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0819.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0820.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0821.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0822.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0823.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0824.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0825.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0826.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0827.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0828.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0829.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0830.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0831.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0832.txt)
Load file(../data/ADFA-LD/Training_Data_Master/UTD-0833.txt)
Traceback (most recent call last):
File "D:\Users\Administrator\eclipsFaySon\secuit\code\8-2.py", line 76, in
score=model_selection.cross_val_score(logreg, x, y, n_jobs=-1, cv=10)
File "D:\ProgramData\Anaconda2\lib\site-packages\sklearn\model_selection_validation.py", line 342, in cross_val_score
pre_dispatch=pre_dispatch)
File "D:\ProgramData\Anaconda2\lib\site-packages\sklearn\model_selection_validation.py", line 206, in cross_validate
for train, test in cv.split(X, y, groups))
File "D:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 789, in call
self.retrieve()
File "D:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py", line 740, in retrieve
raise exception
sklearn.externals.joblib.my_exceptions.JoblibValueError: JoblibValueError


Multiprocessing exception:
...........................................................................
D:\Users\Administrator\eclipsFaySon\secuit\code\8-2.py in ()
71 solver='sgd', verbose=10, tol=1e-4, random_state=1,
72 learning_rate_init=.1)
73
74 logreg = linear_model.LogisticRegression(C=1e5)
75
---> 76 score=model_selection.cross_val_score(logreg, x, y, n_jobs=-1, cv=10)
77 print np.mean(score)
78
79
80

...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\model_selection_validation.py in cross_val_score(estimator=LogisticRegression(C=100000.0, class_weight=None...linear', tol=0.0001, verbose=0, warm_start=False), X=array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0,...0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64), y=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], groups=None, scoring=None, cv=10, n_jobs=-1, verbose=0, fit_params=None, pre_dispatch='2n_jobs')
337 cv_results = cross_validate(estimator=estimator, X=X, y=y, groups=groups,
338 scoring={'score': scorer}, cv=cv,
339 return_train_score=False,
340 n_jobs=n_jobs, verbose=verbose,
341 fit_params=fit_params,
--> 342 pre_dispatch=pre_dispatch)
pre_dispatch = '2
n_jobs'
343 return cv_results['test_score']
344
345
346 def _fit_and_score(estimator, X, y, scorer, train, test, verbose,

...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\model_selection_validation.py in cross_validate(estimator=LogisticRegression(C=100000.0, class_weight=None...linear', tol=0.0001, verbose=0, warm_start=False), X=array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0,...0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64), y=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], groups=None, scoring={'score': }, cv=StratifiedKFold(n_splits=10, random_state=None, shuffle=False), n_jobs=-1, verbose=0, fit_params=None, pre_dispatch='2*n_jobs', return_train_score=False)
201 scores = parallel(
202 delayed(_fit_and_score)(
203 clone(estimator), X, y, scorers, train, test, verbose, None,
204 fit_params, return_train_score=return_train_score,
205 return_times=True)
--> 206 for train, test in cv.split(X, y, groups))
cv.split = <bound method StratifiedKFold.split of Stratifie...d(n_splits=10, random_state=None, shuffle=False)>
X = array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0,...0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64)
y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
groups = None
207
208 if return_train_score:
209 train_scores, test_scores, fit_times, score_times = zip(*scores)
210 train_scores = _aggregate_score_dicts(train_scores)

...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py in call(self=Parallel(n_jobs=-1), iterable=<generator object >)
784 if pre_dispatch == "all" or n_jobs == 1:
785 # The iterable was consumed all at once by the above for loop.
786 # No need to wait for async callbacks to trigger to
787 # consumption.
788 self._iterating = False
--> 789 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=-1)>
790 # Make sure that we get a last message telling us we are done
791 elapsed_time = time.time() - self._start_time
792 self._print('Done %3i out of %3i | elapsed: %s finished',
793 (len(self._output), len(self._output),


Sub-process traceback:

ValueError Wed Apr 3 08:53:56 2019
PID: 16256 Python 2.7.15: D:\ProgramData\Anaconda2\python.exe
...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\externals\joblib\parallel.py in call(self=<sklearn.externals.joblib.parallel.BatchedCalls object>)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self._size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
func =
args = (LogisticRegression(C=100000.0, class_weight=None...linear', tol=0.0001, verbose=0, warm_start=False), array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0,...0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64), [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], {'score': }, array([ 84, 85, 86, 87, 88, 89, 90, 91, ... 826, 827, 828, 829, 830, 831, 832], dtype=int64), array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1..., 77, 78, 79, 80, 81, 82, 83],
dtype=int64), 0, None, None)
kwargs = {'return_times': True, 'return_train_score': False}
self.items = [(, (LogisticRegression(C=100000.0, class_weight=None...linear', tol=0.0001, verbose=0, warm_start=False), array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0,...0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64), [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], {'score': }, array([ 84, 85, 86, 87, 88, 89, 90, 91, ... 826, 827, 828, 829, 830, 831, 832], dtype=int64), array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1..., 77, 78, 79, 80, 81, 82, 83],
dtype=int64), 0, None, None), {'return_times': True, 'return_train_score': False})]
132
133 def len(self):
134 return self._size
135

...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\model_selection_validation.py in _fit_and_score(estimator=LogisticRegression(C=100000.0, class_weight=None...linear', tol=0.0001, verbose=0, warm_start=False), X=array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0,...0],
[0, 0, 0, ..., 0, 0, 0]], dtype=int64), y=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...], scorer={'score': }, train=array([ 84, 85, 86, 87, 88, 89, 90, 91, ... 826, 827, 828, 829, 830, 831, 832], dtype=int64), test=array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1..., 77, 78, 79, 80, 81, 82, 83],
dtype=int64), verbose=0, parameters=None, fit_params={}, return_train_score=False, return_parameters=False, return_n_test_samples=False, return_times=True, error_score='raise')
453
454 try:
455 if y_train is None:
456 estimator.fit(X_train, **fit_params)
457 else:
--> 458 estimator.fit(X_train, y_train, **fit_params)
estimator.fit = <bound method LogisticRegression.fit of Logistic...inear', tol=0.0001, verbose=0, warm_start=False)>
X_train = array([[ 0, 0, 0, ..., 0, 0, 0],
... 0, 0, 0, ..., 0, 0, 0]], dtype=int64)
y_train = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
fit_params = {}
459
460 except Exception as e:
461 # Note fit time as time until error
462 fit_time = time.time() - start_time

...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\linear_model\logistic.py in fit(self=LogisticRegression(C=100000.0, class_weight=None...linear', tol=0.0001, verbose=0, warm_start=False), X=array([[ 0., 0., 0., ..., 0., 0., 0.]... [ 0., 0., 0., ..., 0., 0., 0.]]), y=array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0]), sample_weight=None)
1232 " = {}.".format(self.n_jobs))
1233 self.coef_, self.intercept_, n_iter_ = fit_liblinear(
1234 X, y, self.C, self.fit_intercept, self.intercept_scaling,
1235 self.class_weight, self.penalty, self.dual, self.verbose,
1236 self.max_iter, self.tol, self.random_state,
-> 1237 sample_weight=sample_weight)
sample_weight = None
1238 self.n_iter
= np.array([n_iter_])
1239 return self
1240
1241 if self.solver in ['sag', 'saga']:

...........................................................................
D:\ProgramData\Anaconda2\lib\site-packages\sklearn\svm\base.py in fit_liblinear(X=array([[ 0., 0., 0., ..., 0., 0., 0.]... [ 0., 0., 0., ..., 0., 0., 0.]]), y=array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0]), C=100000.0, fit_intercept=True, intercept_scaling=1, class_weight=None, penalty='l2', dual=False, verbose=0, max_iter=100, tol=0.0001, random_state=None, multi_class='ovr', loss='logistic_regression', epsilon=0.1, sample_weight=None)
848 y_ind = enc.fit_transform(y)
849 classes
= enc.classes_
850 if len(classes_) < 2:
851 raise ValueError("This solver needs samples of at least 2 classes"
852 " in the data, but the data contains only one"
--> 853 " class: %r" % classes_[0])
classes_ = array([0])
854
855 class_weight_ = compute_class_weight(class_weight, classes_, y)
856 else:
857 class_weight_ = np.empty(0, dtype=np.float64)

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0


第14章神经网络算法识别MNIST手写数字案例中数据集不能在线下载

import matplotlib.pyplot as plt
from sklearn.datasets import fetch_mldata
from sklearn.neural_network import MLPClassifier

mnist = fetch_mldata("MNIST original")

rescale the data, use the traditional train/test split

X, y = mnist.data / 255., mnist.target
X_train, X_test = X[:60000], X[60000:]
y_train, y_test = y[:60000], y[60000:]

HTTPError: HTTP Error 404: Dataset 'mnist-original' not found on mldata.org.

对于 7-3.py 中的代码有一处疑惑

line 37:
webshell_bigram_vectorizer = CountVectorizer(ngram_range=(1, 1), decode_error="ignore",
token_pattern = r_token_pattern,min_df=1)

line 45:
wp_bigram_vectorizer = CountVectorizer(ngram_range=(2, 2), decode_error="ignore",
token_pattern = r_token_pattern,min_df=1,vocabulary=vocabulary)

在对黑样本的特征化过程中,用的是1-gram,提取出的 vocabulary 也是 1-gram

但是在白样本的特征化中,使用的仍然是 2-gram,提取出的 字符串 不可能出现在 vocabulary 中啊..
白样本的特征向量全都为 [0 0 0 ... 0 0 0]

关于code部分(5-4.py)有点问题咨询

Warning (from warnings module):
File "C:\Users\lenvov\AppData\Local\Programs\Python\Python38\lib\site-packages\sklearn\model_selection_split.py", line 665
warnings.warn(("The least populated class in y has only %d"
UserWarning: The least populated class in y has only 2 members, which is less than n_splits=10.
[1. 1. 1. 1. 1. 0.77777778

  1.     1.         0.88888889 0.88888889]
    

5-4.py运行结果如上面所示,10折交叉验证存在问题,与样本有关

与书上的结果不同

兜哥您好,本代码当N=100时python3.6运行出来的结果是100,而您书上给出的是83.3333。当我把N改成90时,结果就是83.333了。对于这问题希望您解释一下,是否是代码里有些问题呢?

统计最频繁使用的前50个命令和最不频繁50个命令对应的代码疑问

兜哥,你好

WEB安全机器学习入门P78里面有个“统计最频繁使用的前50个命令和最不频繁50个命令”对应的代码如下:

fdist = FreqDist(dist).keys()
dist_max=set(fdist[0:50])
dist_min = set(fdist[-50:])

我看了官网对FreqDist的解释,该函数功能是对词频进行统计,例如这里如果使用临时变量tmp进行遍历即可得到如下结果:

tmp  = FreqDist(dist)
for key in tmp:
    print key+" : "+str(tmp[key])

gs : 4
tset : 1
basename : 616
uname : 443
touch : 3
... ...
所以这里已经是去重后的数据,但使用fdist[0:50]和fdist[-50:]并不能得到最频繁使用的前50个命令和最不频繁50个命令,比如fdist[0:10]得到的就是字典前10条。

求指导!谢谢了!

求助代码

您好,方便给发一份代码吗,下载了三天,实在是下载不下来。
我的邮箱是:[email protected]
很感谢您能公布代码

P132中倒数第四行勘误

P132,倒数第四行,原文:"from sklearn.cluster import K-Means",纠正为:"from sklearn.cluster import KMeans"

勘误

兜哥,你好
我在学习过程中发现的书中的一些问题,如果我的勘误不对的话,还请兜哥原谅

P12 图2-1 主流基于Python的机器学习库 横纵坐标单词拼写错误 分别为Commits、Contributors
P14 >>>a = array([1, 2, 3]) 应为 a = np.array([1, 2, 3])
P15 >>>c.shape(2,3,2,3) 应为 c.reshape(2,3,2,3)
P37 应修改为
TensorFlow中有类似实现:

from sklearn.feature_extraction.text import CountVectorizer
from tensorflow.contrib import learn
MAX_DOCUMENT_LENGTH=100

样本无法读取

7-6.py中使用mnist.pkl.gz,在python3.6中读取其中的值会报错UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

P201中的x和_y,以及系统的输入输出问题

首先,在这里_y应该改成y_吧?
然后,y_对应的应该是mnist.train.labels(注:mnist.train.labels是一个 [60000, 10] 的数字矩阵,用来表示训练数据集的标签)的第二个维度对应的one-hot编码之后的值,而不是整个系统的输出吧?
而整个系统的输出应该是y吧?

同时x对应的是mnist.train.images的第二个维度对应的值吧?(mnist.train.images 是一个形状为 [60000, 784] 的张量,第一个维度数字用来索引图片,第二个维度数字用来索引每张图片中的像素点。)
而整个系统的输入应该是x和y_共同构成的吧?

书中的一些问题还有一个疑问!!!(2017年11月第1版第3次印刷)

P134,最后一行,原文:"plt.show()”缩进有问题,应该与for对齐。
P139,第2行,原文:“那么称这个事件A。为k项集事件A” 应改为 “那么称这个事件A为k项集事件A”
P151, 表12-2,发射概率应该是隐藏状态下输出某个观察状态的概率,表12-2应为S1-S4向ACNT的发射概率矩阵。
P153, 第一行,原文“参数异常就输入第三种” 应改为 “参数异常就属于第三种”
P160, 第12行,原文: “if domain>=MIN_LEN:” 应改为"if len(domain)>=MIN:"
P160, 倒数第13行,原文: ‘if domain >= MIN_LEN:’ 应改为 “if len(domain)>=MIN:”
P165 , 第1行,原文:”ne04j启动“和 第2行,默认密码为ne04j/ne04j, 应修改为”neo4j启动“和“neo4j/neo4j”
P185, 第3行,原文:“初步怀疑邮箱file1和file1为同一黑产控制的后门文件” 应修改为“初步怀疑file1和file2为同一黑产控制的后门文件”
P190, 倒数第5行 图14-6,原文: “24X24的图片”,应该为:“28X28的图片”
P201, 倒数第13行和第14行,原文“y” 应该为“y
P203, 倒数第10行和第11行,原文“y” 应该为“y

下面这个问题需要解答一下:
P143, 倒数第二行和倒数第一行,原文:“开始节点的值表示这个项集的支持度”,是不是应该说开始节点的值表示的是这个项集的出现次数呢? 因为支持度是一个比例值,应该是0-1之间的? 倒数第一行,“项集{z}的支持度为5.。。。”应该是说出现次数为5吧?
P144页, 第5行,原文:“最小支持度设为3”,支持度是个比例值吗?兜哥求解答,O(∩_∩)O谢谢。

关于1book MasqueradeDat/label.txt一点疑问

你好,最近开始学习您的web安全机器学习一书,请问MasqueradeDat/label.txt这个文件是怎么生成的呢?看到您的代码中应用了这个文件,但是我看原始数据集中并没有这个文件,所以想问一下这个文件是做什么用的,是如何生成的呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.