m2shad0w / blog Goto Github PK
View Code? Open in Web Editor NEW:fire: :clap: :dog: blog
Home Page: https://m2shad0w.com/blog
:fire: :clap: :dog: blog
Home Page: https://m2shad0w.com/blog
ifconfig
获取网卡名,DNS 解析是从 53 port 发出的tcpdump -i enp3s0 -s 0 -w /var/tmp/dns.cap port 53
/etc/resolv.conf
# Generated by NetworkManager
#search hunliji.cn
nameserver 114.114.114.114
nameserver 223.6.6.6
search
的用处hunliji.cn
后缀, 我们实际的请求域名解析并不是这样的。是不是去掉 search 就好了呢?使用的工具是 go-mysql-elasticsearch, 修改了一些原作者的代码,以便支持简单的 geo location 同步, 通过经纬度的两个字段合并到一个字段,在增量同步修改经纬度需要这两个地段同时满足。整个涉及思路是工具模拟 mysql master 的 slave。
对于其他表关联的字段,上面的工具还没发完全支持,通过 python 代码离线定时维护
准备
1.0 mac dmg 下载地址
1.1 文档地址 https://www.sublimetext.com/docs/3/
1.2 安装 Package Control https://packagecontrol.io/installation#st3
插件安装
2.0 SublimeTmpl 代码模板生成
2.1 GitGutter 版本代码改动对比插件
2.2 代码自动补全插件
终端打开别名设置
alias sub='open -a "Sublime Text"'
快捷键打开文件所在目录 SHIRT + COMMAND + T
#!/bin/bash
# Modified following this issue: https://github.com/wbond/sublime_terminal/issues/89
CD_CMD="cd "\\\"$(pwd)\\\"" && clear"
if echo "$SHELL" | grep -E "/fish$" &> /dev/null; then
CD_CMD="cd "\\\"$(pwd)\\\""; and clear"
fi
VERSION=$(sw_vers -productVersion)
OPEN_IN_TAB=0
while [ "$1" != "" ]; do
PARAM="$1"
VALUE="$2"
case "$PARAM" in
--open-in-tab)
OPEN_IN_TAB=1
;;
esac
shift
done
if (( $(expr $VERSION '<' 10.7) )); then
RUNNING=$(osascript<<END
tell application "System Events"
count(processes whose name is "iTerm")
end tell
END
)
else
RUNNING=1
fi
if (( ! $RUNNING )); then
osascript<<END
tell application "iTerm"
tell current window
tell current session of (create tab with default profile)
write text "$CD_CMD"
end tell
end tell
activate
end tell
END
else
if (( $OPEN_IN_TAB )); then
osascript &>/dev/null <<EOF
tell application "iTerm"
if (count of windows) = 0 then
set theWindow to (create window with default profile)
set theSession to current session of theWindow
else
set theWindow to current window
tell current window
set theTab to create tab with default profile
set theSession to current session of theTab
end tell
end if
tell theSession
write text "$CD_CMD"
end tell
activate
end tell
EOF
else
osascript &>/dev/null <<EOF
tell application "iTerm"
tell (create window with default profile)
tell the current session
write text "$CD_CMD"
end tell
end tell
activate
end tell
EOF
fi
fi
// Preferences.sublime-settings
// The number of spaces a tab is considered equal to
"tab_size": 4,
// Set to true to insert spaces when tab is pressed
"translate_tabs_to_spaces": true,
https://packagecontrol.io/packages/Python%20PEP8%20Autoformat
OSX: ctrl+shift+r
https://github.com/revolunet/sublimetext-markdown-preview
热预览 https://github.com/revolunet/sublimetext-markdown-preview#live-reload
参考https://wiki.python.org/moin/PyPiImplementations
选择了小而美,并且最近一年还在维护的 pypiserver
export PRIVATE_PYPI=xxx
cd $PRIVATE_PYPI
virtualenv pypienv # 建立一个virtaulenv
source $PRIVATE_PYPI/pypienv/bin/activate
pip install pypiserver # 安装pypi server
mkdir $PRIVATE_PYPI/package # 建立存放packages的文件夹
#run-pypi.sh
#!/bin/sh
# 启动virtualenv
. $PRIVATE_PYPI/pypienv/bin/activate
exec pypi-server -p 3141 $PRIVATE_PYPI/package
pip install supervisor
echo_supervisord_conf > /etc/supervisord.conf #生成配置文件
supervisord #启动
#配置 pypi-server
[program:pypi-server]
directory=/home/hadoop
command=sh run-pypi.sh
autostart=true
autorestart=true
redirect_stderr=true
startretries=3 ; 启动失败自动重试次数,默认是 3
user=root ; 用哪个用户启动
redirect_stderr=true ; 把 stderr 重定向到 stdout,默认 false
stdout_logfile_maxbytes=20MB ; stdout 日志文件大小,默认 50MB
stdout_logfile_backups=20 ; stdout 日志文件备份数; stdout 日志文件,需要注意当指定目录不存在时无法正常启动,所以需要手动创建目录(supervisord 会自动创建日志文件)
stdout_logfile=/var/www/logs/pypi_stdout.log
# 软链
$ cd /etc/supervisor/conf.d/
$ sudo ln -s $PRIVATE_PYPI/pypi-supervisor.conf pypi-supervisor.conf
supervisorctl start pypi-server
sudo yum install httpd-tools # ubuntu apt-get install apache2-utils
htpasswd -sc $PRIVATE_PYPI/.htaccess user
exec pypi-server -p 3141 -P $PRIVATE_PYPI/.htaccess $PRIVATE_PYPI/package
刷新 supervisor
sudo supervisorctl reload
打包机配置 .pypirc
[distutils]
index-servers=privatepypi
[privatepypi]
repository:url
username:your name
password:your passwd
python setup.py sdist upload -r privatepypi
pip install --extra-index-url path package-name --trusted-host path
参考 https://github.com/pypiserver/pypiserver#quickstart-installation-and-usage
hadoop 以特定用户创建hadoop目录
sudo -u hdfs hadoop fs -mkdir /user/myfile
hadoop fs -getmerge track_event/16-10-17/event_16-10-17_04_00_00 event_16-10-17_04_00_00.log
sudo -u hadoop hadoop distcp hdfs://master:9000//user/hadoop/track_event/16-12-02 hdfs://$HOST:8020/flume/track_event/dt=16-12-02
main : run as user is nobody
main : requested yarn user is hdfs
Can't create directory
change the permit of yarn mr dir
like chmod -R 777 /hadoop/yarn/local
sudo -u hadoop hadoop fs -ls hftp://$HOST:50070/user/hadoop/
手动 balance
前段时间将日志服务从 json 格式到 protobuf 的迁移,性能真的是有了很大的提升,以下图示现阶段常规跑起来之后的系统资源占比。请求的量比例基本一致。
protobuf 承载的io压力是 json 的5倍,响应时间差不多,系统资源消耗将为一半。在我们日志回收应用中有很好的性能提升。
https://medium.com/@caffeinocode/bye-bye-json-welcome-protocol-buffers-a3e4319ba51
https://developers.google.com/protocol-buffers/docs/proto3
https://www.infoq.cn/article/json-is-5-times-faster-than-protobuf
make; make install
http_load
usage: http_load [-checksum] [-throttle] [-proxy host:port] [-verbose] [-timeout secs] [-sip sip_file]
-parallel N | -rate N [-jitter]
-fetches N | -seconds N
url_file
One start specifier, either -parallel or -rate, is required.
One end specifier, either -fetches or -seconds, is required.
http_load -parallel 5 -fetches 1000 url.txt
1000 fetches, 5 max parallel, 53000 bytes, in 3.66203 seconds
53 mean bytes/connection
273.072 fetches/sec, 14472.8 bytes/sec
msecs/connect: 0.053083 mean, 0.316 max, 0.017 min
msecs/first-response: 18.2421 mean, 70.557 max, 7.095 min
HTTP response codes:
code 200 -- 1000
1000 fetches, 5 max parallel, 1.46075e+06 bytes, in 801.51 seconds
1460.75 mean bytes/connection
1.24764 fetches/sec, 1822.5 bytes/sec
msecs/connect: 0.056571 mean, 0.306 max, 0.01 min
msecs/first-response: 3380.38 mean, 6363.63 max, 739.752 min
11 timeouts
11 bad byte counts
HTTP response codes:
code 200 -- 989
# web 服务 top
25701 m2 20 0 1395112 191568 29836 S 99.3 0.6 13:59.36 python
import pandas as pd
...
# data = pd.DataFrame()
# for category, group in svm_data.groupby('category'):
# group_size = int(category_num_map.get(category, 0))
# if not group_size:
# continue
# group['group'] = map(lambda x: x / group_size, range(group.shape[0]))
# group['order'] = category_order_map.get(category, 99999)
# data = data.append(group)
埋了这么一个雷 data = data.append(group)
DataFrame.append 底层调用的是这个函数
Notes
If a list of dict/series is passed and the keys are all contained in the DataFrame’s index, the order of the columns in the resulting DataFrame will be unchanged.
Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.
DataFrame.append 返回一个对象,每次都是一个拷贝
换成 list 性能明显提升
大平台 交流机会多 学习空间大 年终奖
1. 通过大量商业数据,分析实施商品挖掘、用户推荐、卖家分析、用户画像等;
2. 个性化推荐系统、广告系统、搜索系统、机器学习系统、风控系统、爬虫系统完善。
3. 与数据产品广泛沟通,提供日常基础数据,提高数据营效率;
发展沿革
2013 年 3 月 婚礼纪 1.0 正式上线;
2013 年 7 月 获得青松基金数百万元天使投资;
2013 年 11 月 婚礼纪 2.0 发布,引入商家模块,升级为结婚电商服务平台;
2014 年 8 月 获得祥峰投资(Vertex)数百万美元的 A 轮融资;
2014 年 9 月 推出“新娘说”社区功能;
2014 年 12 月 支付系统搭建完成,支持在线交易;
2015 年 4 月 获得经纬创投领投的千万美元级 B 轮融资;
2015 年 11 月 推出婚品电商交易板块;
2016 年 3 月 获得 B+轮投资,由复星昆仲领头,经纬和祥峰继续跟投,两轮融资
总达 3000 万美元;
2016 年 7 月 发布结婚行业首款 SaaS 系统“云蝌”(现已升级为“海草云”);
2016 年 8 月 接入金融入口,支持金融分期付款项目“新婚贷”;
2016 年 10 月 在杭州开设结婚行业首个依托大数据平台的新体验模式店——婚礼
纪体验中心;
2018 年 3 月 完成由兰馨亚洲领投,经纬**、复星集团跟投的 6500 万美金 C 轮
融资,并计划发起 2 亿人民币产业投资基金,布局上下游产业链,基于大数据深度融合,
形成有效的产业协同;
2018 年 6 月 注册用户数突破 4000 万;
2018 年 6 月 获得 C1 轮融资,由上合资本领投,老股东兰馨亚洲、经纬**、复星
锐正跟投,融资累计过亿美金;
2018 年 7 月 举办首届金犀奖全球结婚产业潮流峰会,创立“金犀奖”被行业和媒
体评为最重要的商家评级指南、结婚界《米其林红色宝典》。
lu_fei#hunliji.com
在流量控制设计中,用 redis 缓存流量实时值,redis 中缓存中一个 key 的 values 值是个 set,并且 set 的值有3w+,每个值的存储的是 string,大多在6-7位。所以一个键值对的大小就在 200k 以上。
可以看到 cpu usage, network usage 两项指标前半部分都很高。cpu 使用率有毛刺产生。
遇到这个问题,我记得 redis value 值太大,对 redis 的 qps 影响还是比较大的。在数据包大小在 1k 的时候是个性能拐点。
# redis log file
sudo tail -f /var/log/redis/redis.log
18013:C 04 Jan 16:41:36.609 * RDB: 905 MB of memory used by copy-on-write
16206:M 04 Jan 16:41:36.630 * Background saving terminated with success
16206:M 04 Jan 16:42:37.018 * 10000 changes in 60 seconds. Saving...
16206:M 04 Jan 16:42:37.035 * Background saving started by pid 18074
18074:C 04 Jan 16:42:50.182 * DB saved on disk
18074:C 04 Jan 16:42:50.191 * RDB: 918 MB of memory used by copy-on-write
16206:M 04 Jan 16:42:50.289 * Background saving terminated with success
通过上面基本定位是数据落磁盘的 cpu 开销。跟 redis 的默认配置有关:
sudo vi /etc/redis.conf
https://ningyu1.github.io/site/post/32-redis-aof/
https://www.cnblogs.com/mindwind/p/5067905.html
ubuntu 下很多原件是下载 tar 包,解压之后就可以用了。但是没有图标,无法在 dash 板上固定。
比如现在想要给 goland 加个启动按钮,可以这么操作
sudo gedit /usr/share/applications/goland.desktop
#!/usr/bin/env xdg-open
[Desktop Entry]
Encoding=UTF-8
Name=GoLand
Comment=go lang develop tool
## software location
Exec=/home/m2/software/GoLand-2018.2.1/bin/goland
Icon=/home/m2/software/GoLand-2018.2.1/bin/goland.png
Terminal=false
StartupNotify=true
Type=Application
Categories=Application;Development;
sudo add-apt-repository ppa:hzwhuang/ss-qt5sudo apt-get updatesudo apt-get install shadowsocks-qt5
ubuntu shell copy
“选择=复制,中键单击=粘贴”。“Ctrl+Shift+C=复制,Ctrl+Shift+V=粘贴”
sudo apt-get install libgnome2-bin
gnome-open file
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.