Senior Software Engineer | Senior DevOps Engineer
y4h2 / personal-notes Goto Github PK
View Code? Open in Web Editor NEWmy personal notes
my personal notes
Senior Software Engineer | Senior DevOps Engineer
based on GitOps idea
属于老生常谈的问题了,还是有很多细节需要注意的
https://pkg.go.dev/github.com/pkg/errors#hdr-Retrieving_the_stack_trace_of_an_error_or_wrapper
print error trace stack
if err, ok := err.(stackTracer); ok {
for _, f := range err.StackTrace() {
fmt.Printf("%+s:%d\n", f, f)
}
}
import (
_ "embed"
)
//go:embed templates/es.remoteService.yaml
var esRemoteServiceYAML string
BDD (Behaviour Driven Development)
BDD的最典型语句就是Given, When和Then
Python有个behave库
典型例子就是
from behave import *
@given('I am on home page')
def step_i_am_on_home_page(context):
context.driver.get("<http://demo.magentocommerce.com/>")
@when('I search for {text}')
def step_i_search_for(context, text):
search_field = context.driver.find_element_by_name("q")
search_field.clear()
# enter search keyword and submit
search_field.send_keys(text)
search_field.submit()
@then('I should see list of matching products in search results')
def step_i_should_see_list(context):
products = context.driver.\\
find_elements_by_xpath("//h2[@class='product-name']/a")
# check count of products shown in results
assert len(products) > 0
至于具体实现原理就是基于正则表达式匹配
这样给调试代码带来非常大的麻烦
个人理解是具体看BDD测试用例是谁来写
Config Connector is an open source Kubernetes addon that allows you to manage Google Cloud resources through Kubernetes.
sqlx: https://github.com/jmoiron/sqlx
Schema Sample
和MySQL的区别
只有mysql返回affected rows, PSQL需要使用RETURNING语句
简化terraform脚本
{
"displayName": "99% - Distribution Cut - Calendar month",
"goal": 0.99,
"calendarPeriod": "MONTH",
"serviceLevelIndicator": {
"requestBased": {
"distributionCut": {
"distributionFilter": "metric.type=\"custom.googleapis.com/opencensus/grpc.io/client/roundtrip_latency\" resource.type=\"global\"",
"range": {
"min": -9007199254740991,
"max": 100
}
}
}
}
}
Main use case: to access GCP services, avoid exposing service account key.
Bind IAM service account with Kubernetes service account
Control Plane Components
Node Components:
helm chart最基本的功能是模板
更高级一点的功能是管理相关的dependency和test
kubectl核心:
Kubernetes源码分析一叶知秋(二)设计模式Visitor的实现与发送pod创建请求的细节
09 | Go 编程模式:Kubernetes Visitor模式
史上最全设计模式导学目录
Authorize actions in clusters using role-based access control
Kubernetes的Role支持四种binding (https://cloud.google.com/kubernetes-engine/docs/how-to/role-based-access-control#rolebinding)
最小的权限是container.clusters.get
In almost all cases, Kubernetes RBAC can be used instead of IAM. GKE users require at minimum, the container.clusters.get IAM permission in the project that contains the cluster. This permission is included in the container.clusterViewer role, and in other more highly privileged roles. The container.clusters.get permission is required for users to authenticate to the clusters in the project, but does not authorize them to perform any actions inside those clusters. Authorization may then be provided by either IAM or Kubernetes RBAC.
例子:
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: pod-reader-binding
namespace: accounting
subjects:
# Google Cloud user account
- kind: User
name: [email protected]
# Kubernetes service account
- kind: ServiceAccount
name: johndoe
# IAM service account
- kind: User
name: [email protected]
# Google Group
- kind: Group
name: [email protected]
roleRef:
kind: Role
name: pod-reader
apiGroup: rbac.authorization.k8s.io
现在的主要问题是怎么让Role能binding到IAM的role group中
在Kubernetes中,ServiceAccount主要用于给Pod提供权限。
ServiceAccount可以和role通过RoleBinding绑定在一起
ingress expose service to external
example
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-ingress
spec:
rules:
- host: example.com
http:
paths:
- path: /foo
pathType: Prefix
backend:
service:
name: foo-service
port:
number: 3000
- path: /bar
pathType: Prefix
backend:
service:
name: bar-service
port:
number: 6000
- host: foo.example.com
http:
paths:
- pathType: Prefix
path: "/foo"
backend:
service:
name: foo-service-2
port:
number: 80
- host: "*.foo.example.com"
http:
paths:
- pathType: Prefix
path: "/foo"
backend:
service:
name: foo-service-3
port:
number: 8080
压测工具
quit signal for kubernetes job to exit correctly: https://stackoverflow.com/questions/54921054/terminate-istio-sidecar-istio-proxy-for-a-kubernetes-job-cronjob
Current course:
KodeKloud: Terraform Basics Traning Course
理解resource,datasource和variable:
useful providers:
from docker
docker run -d --name sonarqube -e SONAR_ES_BOOTSTRAP_CHECKS_DISABLE=true -p 9000:9000 sonarqube:latest
Running from docker, the docker network should connect SonarQube server's network
docker run \
--rm \
-e SONAR_HOST_URL="http://${SONARQUBE_URL}" \
-e SONAR_SCANNER_OPTS="-Dsonar.projectKey=${YOUR_PROJECT_KEY}"
-e SONAR_LOGIN="myAuthenticationToken" \
-v "${YOUR_REPO}:/usr/src" \
sonarsource/sonar-scanner-cli
pass multiple lines.
<<DELIMITER
hello
world
DELIMITER
the delimiter can be any string. Most people use EOF as the delimiter
类似terraform graph生成.dot, 然后能用graphviz生成svg图片
构造了一个WebSocket server, 一端连在我们的server上,websocket client端需要部署在VPC内部。
两端之间用WebSocket连接,把原本的的单向request,变成了一个相当复杂的双向交互式系统。
这里再说说这个方案存在的问题
一个从根源上解决问题的方法就是配置防火墙,让我们server的ip能够通过VPC.
其次,可以考虑pub/sub的方案来处理,即客户端subscribe SQS,收到消息后立即发送webhook request
其他方案比如gRPC的streaming也是可行的
算是从头到尾参与的一个烂项目,项目开展之前没有足够的时间做调研,等到发现架构有问题的时候,leader却已经在高层面前做完了demo, 开始着手实现细节了。等到想要阻止的时候,已经箭在弦上必须要上线了。
还有一些问题是开始没有想到的,比如connection频繁中断,这个在测试之后就暴露了。
NAT穿透的主要原理:由内网主动找外网建立连接,建立连接之后,外网可以向内网发消息
另:通过pub/sub模式可以避开NAT穿透
主要概念:
OpenTelemetry 快速入门 : 主要是示例代码
traceID和span id包含在context中, 通过传递context来保持连接关系
一定要设置SetTextMapPropagator, 否则不能跨进程trace
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/jaeger"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/sdk/resource"
tracesdk "go.opentelemetry.io/otel/sdk/trace"
semconv "go.opentelemetry.io/otel/semconv/v1.4.0"
)
func SetGlobalTracer(serviceName string, exporterAddress string, exporterPort string) error {
exporter, err := jaeger.New(jaeger.WithAgentEndpoint(
jaeger.WithAgentHost(exporterAddress),
jaeger.WithAgentPort(exporterPort),
))
if err != nil {
return err
}
tp := tracesdk.NewTracerProvider(
tracesdk.WithBatcher(exporter),
tracesdk.WithResource(resource.NewWithAttributes(
semconv.SchemaURL,
semconv.ServiceNameKey.String(serviceName),
)),
)
otel.SetTracerProvider(tp)
otel.SetTextMapPropagator(propagation.NewCompositeTextMapPropagator(propagation.TraceContext{}))
return nil
}
给span添加attributes, 在jaeger里面就是tag
tr := otel.Tracer("component-http")
ctx, span := tr.Start(ctx, "http", race.WithAttributes(attribute.Key("attr-http").String("hello http")))
span.SetAttributes(attribute.Key("http.method").String("GET"))
span.SetAttributes(attribute.Key("http.url").String(url))
span.AddEvent("Init");
span.AddEvent("End");
func parentFunction(ctx context.Context) {
tracer := otel.Tracer("component-parent")
ctx, span := tracer.Start(ctx, "parent")
defer span.End()
// call our child function
childFunction(ctx)
}
func childFunction(ctx context.Context) {
tracer := otel.Tracer("component-child")
ctx, span := tracer.Start(ctx, "child")
defer span.End()
childFunction2(ctx)
}
https://opentelemetry.io/docs/instrumentation/go/getting-started/#bonus-errors
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
"go.opentelemetry.io/otel/trace"
)
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
https://segmentfault.com/a/1190000042031697
Go - Step by step guide for implementing tracing on a microservices architecture
Setting up open telemetry for golang on google cloud platform
诊断日志(用于Debug), 统计日志(用于用户计费), 审计日志
不多
不少
容易遗漏的点
TRACE, DEBUG, INFO , WARN, ERROR, FATAL
一个项目的日志级别需要被所有人共同遵守
对于简单系统, 可以简单采用一个随机数
对于复杂系统, 可以将处理请求的服务器IP,接收到请求的时间等信息编码到RequestID中
INFO级别的日志通常是用于记录常规的系统运行状态, 请求的基本输入输出.
DEBUG则详细记录了一个请求的处理过程, 甚至是每一个函数的输入和输出结果, 遇到一些隐藏比较深得问题, 需要依赖DEBUG日志.
DEBUG日志一般比INFO要多一个数量级
通常方案
业务层: 收到请求包含DEBUG=ON的请求, 则把相关的DEBUG级别日志输出
LB level: 在负载均衡层的Openresty中,实现如下接口:管理员可以配置将哪个用户的哪个桶的哪个对象的哪种操作(注:这是对象存储中的几个概念)输出为DEBUG日志,Openresty会对每个请求进行过滤,当发现请求和配置的DEBUG日志输出条件相匹配时,则在请求的QueryString中新增"DEBUG=ON"参数。
服务在接收到一个请求的时候,记录请求的接收时间(T1),在请求处理完成待发送的时候,会记录请求发送时间(T2),通常一个请求的日志都记为INFO级别,然而当出现请求处理时间(T2-T1)超过一定时间(如10s)时,可以将该日志提升为WARN级别。通过该方法,可以预先发现系统可能存在的一些问题。
通过对日志中的关键字进行监控,可以及时发现系统故障并报警,这对于保证服务的SLA至关重要。
上线后的日志观察
日志输出到不同的文件
日志文件的大小
Kubernetes Alerting | Best Practices in 2022
Deployments and Pods:
Application
Disk Usage Warning
Network Connectivity Issues
Pods that aren't working
Node resource Consumption
Missing pods
Container restarts
参考: Measure your golden signals with GKE Managed Prometheus and the nginx-ingress
部署测试组件
helm upgrade --install ingress-nginx ingress-nginx \
--repo https://[kubernetes.github.io/ingress-nginx](http://kubernetes.github.io/ingress-nginx) \
--namespace ingress-nginx --create-namespace \
--set controller.metrics.enabled=true
部署PodMonitoring
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
name: ingress-nginx-metrics-monitoring
namespace: ingress-nginx
spec:
endpoints:
- interval: 5s
port: 10254
selector:
matchLabels:
app.kubernetes.io/name): ingress-nginx
Testing permissions
testIamPermissions() is used to determin whether user have access to dashboard tool.
IAM basic and predefined roles reference
basic roles:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.