dohsimpson / kubernetes-doc-pdf Goto Github PK

View Code? Open in Web Editor NEW

271.0 271.0 166.0 163.1 MB

Kubernetes PDF Documentation

CSS 6.55% Python 56.75% Shell 36.70%

kubernetes-doc-pdf's People

Contributors

Stargazers

Watchers

Forkers

daqing613 puneethreddy20 zhi-gang-sun jeremy-donson niranjanhd liuyangfa baileyvw tushar5525 maksimu openstacker perrynzhou sekhar4536 360cloud dhiren051 uliul-dev jilan787 monthandwind polymorphic92 sandeepk17 codegazers sanjay-suv veereshar kraghupathi jdossantos sun363587351 sekharbrs gokuladevops leche43 iamyaw crazynuxer 6547709 dilox leiladev timurista sagarg1988 tedsluis pramine vaisakh007 sathyangoud nithesh47 ashagraw91 omkar120 mohamedshafims bayucandra app-addecco-hal sparkma dataguru9 jaganmodepu whitecloud0325 dginx2020-repo borgerli zhengqin fifman seunnuga93 gr4yscale doytsujin alyfantisd kajithirive ranjit-se7en yasir2000 mohamed-b-ezzat walkerdu ybmadhu gavinxiao888 girijalaaditya snr1988 devopsengineer30 marstheonly gokcerbelgusen rindangramadhan rohinpoloju laileman sewunet dheerajarani azurecloudmonk mohammedhassain piyushkashyap2001 venkadali tsingh97 thies-co thiescojo trngrover rajuroja preetamwaghmare himiusxr zed1025 haydar1994 rohitkhot1026 moamjad121 horlaitan15 penchalaiah550 thakur57 iraj-norouzi bharatkarvekar vedant-204 damodaran013 blueskybluesea narsiimha19 bingoassistance rohittaneja0

kubernetes-doc-pdf's Issues

How can i export zh(Chinese) version Kubernetes doc to pdf?

I try to modify the kubernetes-doc.py

directories_pairs = [("https://kubernetes.io/docs/{}/".format(n.lower()), n) for n in directories]

directories_pairs = [("https://kubernetes.io/zh/docs/{}/".format(n.lower()), n) for n in directories]

but the export results PDF is Unrecognizable Code，

Can we support export different Language Kubernetes documentation Which official support

Update to 1.24?

Could you please update to 1.24?

Publish PDF as Release on github

Hi, thanks for taking the time to convert the docs into PDF, it is really useful,

Would be amazing if you publish these into github releases so we could download docs for a specific release and also know which release is the current.

Get documentation on two different languages

Hi,

Some time ago I managed to get the documentation in spanish using your solution. It works likr a charm. The probnlem is thar quite a lot of pages od the documentation are now written to Spanish. So, for me, it could be great to find a way to "print" all the existent doc in Spanish, and all those pages that are not translated, print them in english.
Is it to much to ask for an update on the code in order to make it possible?

Thanks a lot for your effort

Preserve top-level headings and nest sub-headings accordingly

The online documentation, for each resource, has main headings and sub-headings for related sub-topics. It would be nice to preserve this same hierarchy in these offline docs if at all possible so bookmarks are a little more contained.

(Reference guide ToC, online documentation)

(Reference guide ToC, offline documentation)

Error while running it

C:\workspace\github\dohsimpson\kubernetes-doc-pdf>pip install pipenv
Collecting pipenv
  Downloading https://files.pythonhosted.org/packages/13/b4/3ffa55f77161cff9a5220f162670f7c5eb00df52e00939e203f601b0f579/pipenv-2018.11.26-py3-none-any.whl (5.2MB)
    100% |████████████████████████████████| 5.2MB 1.6MB/s
Requirement already satisfied: pip>=9.0.1 in c:\users\poyatm01\appdata\local\programs\python\python37-32\lib\site-packages (from pipenv) (19.0.3)
Collecting certifi (from pipenv)
  Downloading https://files.pythonhosted.org/packages/69/1b/b853c7a9d4f6a6d00749e94eb6f3a041e342a885b87340b79c1ef73e3a78/certifi-2019.6.16-py2.py3-none-any.whl (157kB)
    100% |████████████████████████████████| 163kB 1.4MB/s
Collecting virtualenv (from pipenv)
  Downloading https://files.pythonhosted.org/packages/8b/12/8d4f45b8962b03ac9efefe5ed5053f6b29334d83e438b4fe379d21c0cb8e/virtualenv-16.7.5-py2.py3-none-any.whl (3.3MB)
    100% |████████████████████████████████| 3.3MB 1.1MB/s
Requirement already satisfied: setuptools>=36.2.1 in c:\users\poyatm01\appdata\local\programs\python\python37-32\lib\site-packages (from pipenv) (40.8.0)
Collecting virtualenv-clone>=0.2.5 (from pipenv)
  Downloading https://files.pythonhosted.org/packages/ba/f8/50c2b7dbc99e05fce5e5b9d9a31f37c988c99acd4e8dedd720b7b8d4011d/virtualenv_clone-0.5.3-py2.py3-none-any.whl
Installing collected packages: certifi, virtualenv, virtualenv-clone, pipenv
Successfully installed certifi-2019.6.16 pipenv-2018.11.26 virtualenv-16.7.5 virtualenv-clone-0.5.3
You are using pip version 19.0.3, however version 19.2.3 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

C:\workspace\github\dohsimpson\kubernetes-doc-pdf>pipenv install
Creating a virtualenv for this project…
Pipfile: C:\workspace\github\dohsimpson\kubernetes-doc-pdf\Pipfile
Using C:/Users/Poyatm01/AppData/Local/Programs/Python/Python37-32/python.exe (3.7.3) to create virtualenv…
[=   ] Creating virtual environment...Already using interpreter C:\Users\Poyatm01\AppData\Local\Programs\Python\Python37-32\python.exe
Using base prefix 'C:\\Users\\Poyatm01\\AppData\\Local\\Programs\\Python\\Python37-32'
New python executable in C:\Users\Poyatm01\.virtualenvs\kubernetes-doc-pdf-f5LapLRh\Scripts\python.exe
Installing setuptools, pip, wheel...
done.
Running virtualenv with interpreter C:/Users/Poyatm01/AppData/Local/Programs/Python/Python37-32/python.exe

Successfully created virtual environment!
Virtualenv location: C:\Users\Poyatm01\.virtualenvs\kubernetes-doc-pdf-f5LapLRh
Installing dependencies from Pipfile.lock (c42e03)…
  ================================ 22/22 - 00:00:19
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.

C:\workspace\github\dohsimpson\kubernetes-doc-pdf>pipenv shell
Launching subshell in virtual environment…
Microsoft Windows [Version 10.0.16299.1331]
(c) 2017 Microsoft Corporation. All rights reserved.

(kubernetes-doc-pdf-f5LapLRh) C:\workspace\github\dohsimpson\kubernetes-doc-pdf>python kubernetes-doc.py
Setup
downloading...
Traceback (most recent call last):
  File "kubernetes-doc.py", line 63, in <module>
    generate_directory_pdf(url, name)
  File "kubernetes-doc.py", line 39, in generate_directory_pdf
    f.write(html)
  File "C:\Users\Poyatm01\.virtualenvs\kubernetes-doc-pdf-f5LapLRh\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2714' in position 5290: character maps to <undefined>

Recommended Labels in Concepts.pdf seems to be in the wrong place

In page 4 of Concepts.pdf I can see Recommended Labels which should be in much later pages.

Tables in pre-generated PDFs not reflowed

First, LOVE you've done this so thank you! I'm finding that many tables in the PDF docs you've stored in the repo have rightmost columns truncated when viewing in Android Acrobat Reader. I've not tried running the script on-demand. Is this something that's fixable?

Support to change languaje

Hi, thanks a lot for all this effort. It is very useful.

I've read how somebody asked to use the chinesse version of the documentation. I've followed some changes made on the kubernetes-doc.py file to pint to the spanish version: "https://kubernetes.io/es/docs/". The problem is that after performing the "./gen_ref_docs.sh", some parts of the files are like empty. An exaqmple could be the file Setup.pdf attached here
Setup.pdf

I'd like some help in order to fix it and get a usefull pdf file.

Thanks in advance!

doesn't include api reference?

I don't see the api reference docs in these PDF's

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.17/

ePub/mobi formats

I've uploaded epub/mobi formats at https://github.com/tha2015/kubernetes-doc-epub/tree/master/epub
The script is also available.
I hope this will be useful for someone.

IndexError: tuple index out of range

hi, buddy, I met a problem, when I use this tool to transfor the website to pdf, a fatal error was happend, could you give me some advices? thanks, the details below:


ERROR: Failed to load image at "https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/reference/tools.md?pixel" (URLError: )
INFO: Step 5 - Creating layout - Page 1
INFO: Step 5 - Creating layout - Page 2
INFO: Step 5 - Creating layout - Page 3
INFO: Step 5 - Creating layout - Page 4
INFO: Step 5 - Creating layout - Page 5
INFO: Step 5 - Creating layout - Page 6
INFO: Step 5 - Creating layout - Page 7
INFO: Step 5 - Creating layout - Page 8
INFO: Step 5 - Creating layout - Page 9
INFO: Step 5 - Creating layout - Page 10
INFO: Step 5 - Creating layout - Page 11
INFO: Step 5 - Creating layout - Page 12
INFO: Step 5 - Creating layout - Page 13
Traceback (most recent call last):
  File "/usr/bin/weasyprint", line 8, in 
    sys.exit(main())
  File "/usr/lib/python3.6/site-packages/weasyprint/__main__.py", line 212, in main
    getattr(html, 'write_' + format_)(output, **kwargs)
  File "/usr/lib/python3.6/site-packages/weasyprint/__init__.py", line 211, in write_pdf
    font_config=font_config).write_pdf(
  File "/usr/lib/python3.6/site-packages/weasyprint/__init__.py", line 168, in render
    font_config)
  File "/usr/lib/python3.6/site-packages/weasyprint/document.py", line 393, in _render
    [Page(page_box, enable_hinting) for page_box in page_boxes],
  File "/usr/lib/python3.6/site-packages/weasyprint/document.py", line 393, in 
    [Page(page_box, enable_hinting) for page_box in page_boxes],
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/__init__.py", line 126, in layout_document
    pages = list(make_all_pages(context, root_box, html, pages))
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/pages.py", line 803, in make_all_pages
    page, resume_at = remake_page(i, context, root_box, html)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/pages.py", line 742, in remake_page
    page_number, page_state)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/pages.py", line 553, in make_page
    positioned_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 63, in block_level_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 77, in block_level_layout_switch
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 130, in block_box_layout
    absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 510, in block_container_layout
    absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 63, in block_level_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 77, in block_level_layout_switch
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 130, in block_box_layout
    absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 510, in block_container_layout
    absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 63, in block_level_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 77, in block_level_layout_switch
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 130, in block_box_layout
    absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 510, in block_container_layout
    absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 63, in block_level_layout
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 77, in block_level_layout_switch
    page_is_empty, absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 130, in block_box_layout
    absolute_boxes, fixed_boxes, adjoining_margins)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/blocks.py", line 376, in block_container_layout
    for line, resume_at in lines_iterator:
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/inlines.py", line 53, in iter_line_boxes
    absolute_boxes, fixed_boxes, first_letter_style)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/inlines.py", line 70, in get_next_linebox
    skip_stack = skip_first_whitespace(linebox, skip_stack)
  File "/usr/lib/python3.6/site-packages/weasyprint/layout/inlines.py", line 210, in skip_first_whitespace
    result = skip_first_whitespace(box.children[index], next_skip_stack)
IndexError: tuple index out of range
+ docker rm -f weasy

Two languages version

Hi @dohsimpson

I din't want to bother with all this thing about getting support for different languages, but I just wan to make you know that I was able to finally get a two languages verion of the app.
1st to know is that It's the first time I try to code a "complex" app. Actually, I only tried making some kind of bash scripts one time, so you will undertand why this code is so bad :)

This version will spend much more time to get the doc from the website, as it will check if all the urls from the english version also exists in the 2nd language. Also, It will also catch some urls that only exists in the 2nd lang and not in english. This way I think you should get all the existing web pages, for english and for the language you choose.

The only thing to modify in order to get other language than "es", is to change tje "lang" variable in the first lines of the code, and set it to another one (linke it, for example). Thgis way, you should be able to dowload all the web pages that exists in "it" language, and the rest will be in english.

The code will create a new foldr called "tmp/links_{concept}", where concept is some of "setup|reference|tasks|tutorials|concepts". Inside all these folders, you will be able to find the urls of all the web pages it downloaded...

Please, be free to change anything inside the code, as I'm more than sure it can be improved in a lot of ways.

Also, thanks for making this project possible, as now I've learned to code a little better :)

Hre is the code:

import requests_html as rh
import os
import subprocess
import requests
import json
from pathlib import Path

# to change language, set the content of "lang" to iso code. Also, please check that it already exist on k8s web
lang = "es"

def generate_directory_pdf(url1, name, s=None):
    # some needed variables...
    mydir = Path(f"tmp/links_{name}")
    mydir.mkdir(parents=True, exist_ok=True)
    final_links_to_download = f"tmp/links_{name}/links_to_download.json"
    url2 = f"https://kubernetes.io/{lang}/docs/{name}"

    s = rh.HTMLSession() if not s else s
    r1 = s.get(url1)
    r2 = s.get(url2)
    html = ""
    anchors1 = r1.html.find('.td-sidebar-link')
    anchors2 = r2.html.find('.td-sidebar-link')
    links_en = [a.absolute_links.pop() for a in anchors1 if a.element.tag == 'a']
    links_es = [a.absolute_links.pop() for a in anchors2 if a.element.tag == 'a']

    links_en_uniq_a_comprobar = []
    for i in links_en:
        if i not in links_en_uniq_a_comprobar:
            links_en_uniq_a_comprobar.append(i)

    links_solo_es_uniq = []
    for i in links_es:
        if i not in links_solo_es_uniq:
            links_solo_es_uniq.append(i)

    links_es_uniq_a_comprobar = []
    links_es_uniq_a_comprobar = [link.replace("kubernetes.io/docs", "kubernetes.io/{lang}/docs") for link in links_en_uniq_a_comprobar]

    def check_url(tocheck):
        try:
            response = requests.get(tocheck, timeout=5)
            if response.status_code == 200:
                return True
            else:
                return False
        except requests.RequestException:
            return False

    checked_links_mixed = []
    for english, spanish in zip(links_en_uniq_a_comprobar, links_es_uniq_a_comprobar):
        if check_url(spanish):
            checked_links_mixed.append(spanish)
        else:
            checked_links_mixed.append(english)

    mixed_links_to_uniq = checked_links_mixed + links_solo_es_uniq
    filtered_mixed_links_for_lambda = []
    for i in mixed_links_to_uniq:
        if i not in filtered_mixed_links_for_lambda:
            filtered_mixed_links_for_lambda.append(i)


    links_post_lambda = filter(lambda href: href.startswith(url1) or href.startswith(url2), filtered_mixed_links_for_lambda)
    links_post_lambda_list = list(links_post_lambda)


    with open(final_links_to_download, 'w') as output_file:
                json.dump(links_post_lambda_list, output_file, indent=4)

    print("Downloading content from links...")
    cwd = os.getcwd()
    for l1 in links_post_lambda_list:
        r2 = s.get(l1)
        div = r2.html.find('.td-content', first=True, clean=True)
        if div:
            html += div.html
        with open("{}/{}.html".format(cwd, name), "wt") as f:
            f.write(html)

    print("generating pdf in " + name )
    subprocess.run(["{}/weasy_print.sh".format(cwd), name])


if __name__ == '__main__':
    s = rh.HTMLSession()
    directories = [\
                   "setup",
                   "concepts",
                   "tasks",
                   "tutorials",
                   "reference",
                   ]
    directories_pairs = [("https://kubernetes.io/docs/{}/".format(n.lower()), n) for n in directories]
    for url1, name in directories_pairs:
        print("Working with the content in url : " + url1)
        generate_directory_pdf(url1, name)

I tried to clean the code as much as I could, because my original one was full of tests and different ways to try to undertend how thinks work. IF you get any kind of trouble with the clean code, here you have the dirty one, but the one that I made most tests with:

import requests_html as rh
import os
# import pypandoc
import subprocess
import time
import requests
import json
from pathlib import Path

def generate_directory_pdf(url1, name, s=None):
    mydir = Path(f"tmp/links_{name}")
    mydir.mkdir(parents=True, exist_ok=True)
    divss = f"tmp/links_{name}/divs.json"
    file_links_en_a_comprobar = f"tmp/links_{name}/listado_en_a_comprobar.json"
    file_links_es_a_comprobar = f"tmp/links_{name}/listado_es_a_comprobar.json"
    file_solo_es_comprobados = f"tmp/links_{name}/file_solo_es_comprobados.json"
    file_solo_en_comprobados = f"tmp/links_{name}/file_solo_en_comprobados.json"
    file_filtered_mixed_links_for_lambda = f"tmp/links_{name}/filtered_mixed_links_for_lambda.json"
    file_links_post_lambda_to_download = f"tmp/links_{name}/Final_post_lambda_links_to_download.json"
    file_links_post_lambda_LIST_to_download = f"tmp/links_{name}/Final_post_lambda_LIST_links_to_download.json"
    lang = "es"


    url2 = f"https://kubernetes.io/{lang}/docs/{name}"
    # Almacenamos en links todos las refrencias que encontramos en url:
    s = rh.HTMLSession() if not s else s
    r1 = s.get(url1)
    r2 = s.get(url2)
    html = ""
    anchors1 = r1.html.find('.td-sidebar-link')
    anchors2 = r2.html.find('.td-sidebar-link')
    links_en = [a.absolute_links.pop() for a in anchors1 if a.element.tag == 'a']
    links_es = [a.absolute_links.pop() for a in anchors2 if a.element.tag == 'a'] # todas las que ha encontrado en español

    # Uniq en el total de urls
    links_en_uniq_a_comprobar = []
    for i in links_en:
        if i not in links_en_uniq_a_comprobar:
            links_en_uniq_a_comprobar.append(i) # Limpiamos repeticiones de la misma url

    links_solo_es_uniq = []
    for i in links_es:
        if i not in links_solo_es_uniq:
            links_solo_es_uniq.append(i) # Limpiamos repeticiones de la misma url



    # Generamos una lista a_comprobar_en_to_es con todas las urls convertidas al esp:
    links_en_converted_to_es = []
    links_en_converted_to_es = [link.replace("kubernetes.io/docs", "kubernetes.io/es/docs") for link in links_en_uniq_a_comprobar]

    #Cambiamos nombre para facilitar nmbrado entre idiomas:
    links_es_uniq_a_comprobar = links_en_converted_to_es


    # Volcamos links_en_uniq_a_comprobar en fichero listado_en
    with open(file_links_en_a_comprobar, 'w') as output_file:
                json.dump(links_en_uniq_a_comprobar, output_file, indent=4)

    # Volcamos a_comprobar_en_to_es en fichero listado_es
    with open(file_links_es_a_comprobar, 'w') as output_file:
                json.dump(links_es_uniq_a_comprobar, output_file, indent=4)

    # Función que comprueba conexión del contenido de url
    def check_url(tocheck):
        try:
            response = requests.get(tocheck, timeout=5)
            if response.status_code == 200:
                return True
            else:
                return False
        except requests.RequestException:
            return False

    # Comprobamos conexión del contenido de links y de links_es
    # Se añaden a checked_links todos los que existen en español, y el resto en inglés
    checked_links_mixed = []
    links_es_comprobados = []
    links_en_comprobados = []
    for english, spanish in zip(links_en_uniq_a_comprobar, links_es_uniq_a_comprobar):
        #print(f"Se está probando las urls: {english}  y  {spanish}")
        if check_url(spanish):
            checked_links_mixed.append(spanish)
            links_es_comprobados.append(spanish)

            #print(f"Se añade la url {spanish} a Epanish y se descarta {english}")
        else:
            checked_links_mixed.append(english)
            links_en_comprobados.append(english)

    print(f"El listado de links en ESPAÑOL COMPROBADOS se puede mirar ya en el fichero file_solo_es_comprobados")
    with open(file_solo_es_comprobados, 'w') as output_file:
                json.dump(links_es_comprobados, output_file, indent=4)

    print(f"El listado de links en INGLÉS COMPROBADOS se puede mirar ya en el fichero file_solo_en_comprobados")
    with open(file_solo_en_comprobados, 'w') as output_file:
                json.dump(links_en_comprobados, output_file, indent=4)
    time.sleep(15)


    # Añadimos los links que SOLO se han encontrado en español
    mixed_links_to_uniq = checked_links_mixed + links_solo_es_uniq

    # Limpiamos posibles repeticiones entre mixed_links_to_download y links_solo_es_uniq
    filtered_mixed_links_for_lambda = []
    for i in mixed_links_to_uniq:
        if i not in filtered_mixed_links_for_lambda:
            filtered_mixed_links_for_lambda.append(i) # Limpiamos repeticiones de la misma url


# Tras esto, solo deberían quedar las urls que tienen conexión, mezcladas en dos idiomas

    #for i in links_to_download:
    #    print(f"Links comparados y añadidos a -solo español-, uno por uno (antes de lambda): {i}")

    print("Longitud de filtered_mixed_links_for_lambda:", len(filtered_mixed_links_for_lambda))
    print("final_total_links:", filtered_mixed_links_for_lambda)
    time.sleep(10)

    # Escribimos en el fichero el contenido final de las urls a descargar ANTES de LAMBDA:
    with open(file_filtered_mixed_links_for_lambda, 'w') as output_file:  # Cambié 'file' a 'output_file'
                json.dump(filtered_mixed_links_for_lambda, output_file, indent=4)
    print(f"Ya se pueden mirar los links previos a lambda en file_filtered_mixed_links_for_lambda")
    time.sleep(15)

    #yinks = filter(lambda href: href.startswith(url2), links_variable)
    links_post_lambda = filter(lambda href: href.startswith(url1) or href.startswith(url2), filtered_mixed_links_for_lambda)
    links_post_lambda_list = list(links_post_lambda)


    # Probar qué hay en list_yinks, una vez pasado el LAMBDA:
    with open(file_links_post_lambda_LIST_to_download, 'w') as output_file:  # Cambié 'file' a 'output_file'
                json.dump(links_post_lambda_list, output_file, indent=4)


    print("Ya se puede mirar el listado tras lambda en file_links_post_lambda_LIST_to_download")
    print("Longitud de links_post_lambda_list DESPUES de LAMBDA:", len(links_post_lambda_list))

    input("Presiona Enter para continuar...")

    print("downloading...")
    cwd = os.getcwd()
    for l1 in links_post_lambda_list:
        print(f"Final Links post Lambda: {l1}")
        r2 = s.get(l1)
        div = r2.html.find('.td-content', first=True, clean=True)
        print(f"Se están buscando los divs en {l1}. Esto es un div: {div}.")
        if div:
            print(f"Existe div: {div}")
            html += div.html
        with open("{}/{}.html".format(cwd, name), "wt") as f:
            f.write(html)

    print("generating pdf in " + name )
    subprocess.run(["{}/weasy_print.sh".format(cwd), name])

if __name__ == '__main__':
    s = rh.HTMLSession()
    directories = [\
                   "setup",
                   "concepts",
                   "tasks",
                   "tutorials",
                   "reference",
                   ]
    directories_pairs = [("https://kubernetes.io/docs/{}/".format(n.lower()), n) for n in directories]
    for url1, name in directories_pairs:
        print("URL: " + url1, "Directorio: " + name)
        print(name)
        generate_directory_pdf(url1, name)
        print("Generamos directorio con " + url1 + " y " + name )

I hope it works for you.

Best regards!!

Redact "Feedback" sub-sections

At the end of each section of the Kubernetes documentation appears a "Feedback" sub-section designed, ostensibly, to provide feedback on the quality of the current page. This makes perfect sense when viewed live, in a web browser. It, however, only serves to bloat (if slightly) an offline archive of the same documentation occupying, when present, ~20–25% of a given PDF page's layout.

It would make these already nice offline docs look even better, more streamlined, and less full than they currently are. I would suggest, if this change is implemented, to make it the default for your auto-generated PDFs and, if so desired, activated manually with a switch should other users wish to retain it.

dohsimpson / kubernetes-doc-pdf Goto Github PK

kubernetes-doc-pdf's People

Contributors

Stargazers

Watchers

Forkers

kubernetes-doc-pdf's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs