GithubHelp home page GithubHelp logo

doktor-v2's Introduction

doktor-v2

Doktor is a demo web application that is implemented as microservice architecture. This web application provides search and download of technical reports.

Architecture

Microservices are deployed on Kubernetes cluster. Istio is utilized for service mesh in this system.

doktor-v2 architecture

Development

If you are interested in development, you can read developer guides.

API Documents is here.

https://cdsl-research.github.io/doktor-v2/

Branch Policy

  • master
    • Latest and Stable release
    • Create a pull request to this
  • staging
  • production

Directory Structure

Tools:

  • deploy deploy scripts
  • dev_tools development scripts

Microservices:

  • author Manage authors
  • front Provide Web UI for end users
  • front-admin Provide management console
  • fulltext Provide fulltext search for papers
  • paper Manage papers
  • stats Manage access history
  • textize Get text from pdf files
  • thumbnail Managing figures in papers

doktor-v2's People

Contributors

github-actions[bot] avatar kei-gnu avatar takahyon avatar takayu112233 avatar tomoyk avatar

Stargazers

 avatar  avatar

Forkers

tomoyk

doktor-v2's Issues

Uvicornにタイムスタンプつきのログを設定

ログの書き出し

以下を追加すればよい.

import logging
logger = logging.getLogger("uvicorn")

uvicorn にアプリのloggerがでなかったので python の logger を理解した - Qiita

タイムスタンプの追加

log configオプションを使い日時を追加する.

uvicorn ... --log-config

https://www.uvicorn.org/settings/#logging

設定ファイルの参考例

version: 1
disable_existing_loggers: False
formatters:
  timestamped:
    format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
handlers:
  console:
    class: logging.StreamHandler
    level: INFO
    formatter: timestamped
    stream: ext://sys.stdout
root:
  level: INFO
  handlers: [console]

DockerfileのCMDにつける

CMD ["main:app", "--host", "0.0.0.0"]

全文検索の追加

論文の内容を検索できる機能を追加

  • 書き起こし(PDF → テキスト)
  • 全文検索

アクセス数を集計する機能

statsサービスを作って論文や著者ごとのアクセス数を集計する.

アクセス数順で並び替えられる機能の実装のために必要

Telepresenceで開発環境の整備

散乱したdocker-comopseを修正したい.

ローカルでdocker-composeを起動する方法はサービス数が増えると厳しい.

paper-appからpaper-minioへファイル配置時にバケットがない

Appが起動する前にMinioが起動している場合,Appのコードでバケットが初期化される.
一方,Appが起動した後にMinioが起動している場合,Appのコードでバケットが初期化されない.

koyama@doktor-stg1:~/doktor-v2/deploy$ k logs pod/paper-app-deploy-848cb5c945-fcwd7
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     10.42.0.143:49230 - "GET /paper HTTP/1.1" 200 OK
insert_id: 614a7015a9f06c3a4c8419f3
INFO:     10.42.0.148:48688 - "POST /paper HTTP/1.1" 200 OK
11
Upload exception:  S3 operation failed; code: NoSuchBucket, message: The specified bucket does not exist, resource: /paper, request_id: 16A6FB6121DAD1E8, host_id: 9376f326-6dd1-48d3-9a8d-9d86e42ff256, bucket_name: paper
INFO:     10.42.0.148:48688 - "POST /paper/d2d31f49-fe26-4ec2-9b0e-b22395078b22/upload HTTP/1.1" 503 Service Unavailable

Docker hubのレート制限によってたびたび新しいコンテナがプルできない

  Warning  Failed     19s (x3 over 79s)   kubelet            Failed to pull image "prom/prometheus:v2.26.0": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/prom/prometheus:v2.26.0": failed to copy: httpReaderSeeker: failed open: unexpected status code https://registry-1.docker.io/v2/prom/prometheus/manifests/sha256:38d40a760569b1c5aec4a36e8a7f11e86299e9191b9233672a5d41296d8fa74e: 429 Too Many Requests - Server message: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit

とっても困る
ImagePullPolicy: IfNotPresent にしないとかも

frontコンテナでのエラーを突き止める

frontコンテナでの特定エンドポイント GET / で503が発生する.

doktor-v2/front/main.py

Lines 44 to 78 in 9d77ca2

async def top_handler(request: Request):
urls = (f"http://{SVC_PAPER_HOST}:{SVC_PAPER_PORT}/paper",
f"http://{SVC_AUTHOR_HOST}:{SVC_AUTHOR_PORT}/author")
async with aiohttp.ClientSession() as session:
json_raw = await fetch_all(session, urls)
res_paper = json_raw[0]
res_author = json_raw[1]
paper_details = []
for rp in res_paper:
found_author = []
for author_uuid in rp.get("author_uuid"):
author = next(
filter(
lambda x: author_uuid == x.get("uuid"),
res_author))
found_author.append(author)
author_list = [{
"name": fa.get("last_name_ja") +
fa.get("first_name_ja"),
"uuid": fa.get("uuid")
} for fa in found_author]
paper_details.append({
"uuid": rp.get("uuid", "#"),
"title": rp.get("title", "No Title"),
"author": author_list,
"label": rp.get("label", "No Label"),
"created_at": rp.get("created_at")
})
# sample_data = [{"title": "my title", "author": "my author", "label": "my label", "created_at": "2021/02/03"}]
return templates.TemplateResponse("top.html", {
"request": request,
"papers": paper_details
})

Kubernetesのログでは以下が出力される.

koyama@doktor-stg1:~/doktor-v2/deploy/front$ k logs deployment.apps/front-app-deploy
Found 3 pods, using pod/front-app-deploy-d8c7f785-swfsp
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
INFO:     10.42.0.0:33378 - "GET / HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 208, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.9/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/usr/local/lib/python3.9/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 580, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 241, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 52, in app
    response = await func(request)
  File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 226, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 159, in run_endpoint_function
    return await dependant.call(**values)
RuntimeError: coroutine raised StopIteration

初期データの追加スクリプトを作成

stagingではproductionを意識して、データのコンテナ起動時の追加をしていない。

データがなければ動作検証ができないため、データ追加スクリプトを作る。

アプローチ: paperとauthorへHTTPリクエストを送信するスクリプトをつくる。

言語: Python / Go

ダウンロードで発生する504の原因を特定

minioにあるPDFファイルをダウンロードしようとすると504が発生するケースがある.

  • Nginxのupstreamにあるkeepaliveの値を調整
  • VPNのレイテンシが気になる
  • Pythonコードのminio SDKのバッファリングサイズが気になる

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.