GithubHelp home page GithubHelp logo

ha_pdf's Introduction

Home Assistant PDF File Sensor

The pdf sensor enables the parsing and searching of text content from local PDF files using PyPDF2.

Installation

Clone this repo into the custom_components directory:

cd [HA_HOME]/custom_components
git clone https://github.com/emcniece/ha_pdf.git

Configuration

Home Assistant must first be allowed to access the directory with the target PDF file. Add the allowlist_external_dirs to configuration.yaml. In this example, the PDF exists at /config/my.pdf:

homeassistant:
  allowlist_external_dirs:
    - /config

Next add the sensor definition:

sensor:
  - platform: pdf
    name: My PDF Sensor
    file_path: /config/my.pdf

By default this configuration will extract all text content found in the first page of the PDF.

Configuration Variables

file_path (required)

Path to a local PDF file


unit_of_measurement (optional)

Measurement unit to associate with the rendered value


pdf_page (optional)

Numeric value of the PDF Page to search. Default: 0


regex_search (optional)

Regular expression with capture groups used to search the PDF text.


regex_match_index (optional)

Regular expression capture group index to render as the value. Default: 0

Index 0 returns the whole matched string. Indexes >= 1 return valid capture groups.


value_template (optional)

Post-regex template rendering of the value.

  • {{ value }}: parsed text

Full Configuration Example

The PDF in this example contains a line of text reading the following:

Water Consumption Charge     15  x  $  2.2159             33.24 --------------Balance

Three sensors can be used with different regex_match_index capture groups to extract each numeric value:

# Example configuration.yaml entry
homeassistant:
  allowlist_external_dirs:
    - /config

sensor:
  - platform: pdf
    name: Water Usage Volume
    file_path: /config/water-bill.pdf
    unit_of_measurement: m3
    pdf_page: 0
    regex_search: 'Water Consumption Charge\s+([\d.]+)\s+x\s+\$\s+([\d.]+)\s+([\d.]+)\s-+'
    regex_match_index: 1

  - platform: pdf
    name: Water Usage Billing Rate
    file_path: /config/water-bill.pdf
    unit_of_measurement: $
    pdf_page: 0
    regex_search: 'Water Consumption Charge\s+([\d.]+)\s+x\s+\$\s+([\d.]+)\s+([\d.]+)\s-+'
    regex_match_index: 2

  - platform: pdf
    name: Water Usage Total Cost
    file_path: /config/water-bill.pdf
    unit_of_measurement: $
    pdf_page: 0
    regex_search: 'Water Consumption Charge\s+([\d.]+)\s+x\s+\$\s+([\d.]+)\s+([\d.]+)\s-+'
    regex_match_index: 3

ha_pdf's People

Contributors

emcniece avatar dennisfrett avatar

Stargazers

 avatar Ætha avatar  avatar Łukasz Śliwiński avatar Scott Giminiani avatar Denis Sobyanin avatar  avatar  avatar

Watchers

James Cloos avatar  avatar  avatar

ha_pdf's Issues

unable to read PDF file

The file exists, and is readable through a PDF reader.
Thrown error message:

Logger: custom_components.ha_pdf.sensor
Source: custom_components/ha_pdf/sensor.py:122
Integration: ha_pdf
First occurred: January 23, 2024 at 3:09:28 AM (15292 occurrences)
Last logged: 12:05:12 PM

File or data not present at the moment: engie.pdf``

pdf: Error on device update! since HA 2022.12.0 update

Sensor setup is failing since Home Assistant release 2022.12.0 with the following error message in the log:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 503, in _async_add_entity
    await entity.async_device_update(warning=False)
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 729, in async_device_update
    await task
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/ha_pdf/sensor.py", line 114, in update
    page = pdf.getPage(int(self._pdf_page))
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_reader.py", line 476, in getPage
    deprecation_with_replacement(
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_utils.py", line 369, in deprecation_with_replacement
    deprecation(DEPR_MSG_HAPPENED.format(old_name, removed_in, new_name))
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_utils.py", line 351, in deprecation
    raise DeprecationError(msg)
PyPDF2.errors.DeprecationError: reader.getPage(pageNumber) is deprecated and was removed in PyPDF2 3.0.0. Use reader.pages[page_number] instead.

Updating line 114 to this fixes the issue:
page = pdf.pages[int(self._pdf_page)]

Cannot Add Integration

When I try to install this, I get the following error:
<Integration emcniece/ha_pdf> Repository structure for v1.1.2 is not compliant

New install, not working?

Hello,

I have just tried to install ha_pdf as per documentation.

Git duplicated, files in the correct folder:

/Home_Assistant/config/custom_components$ ls ha_pdf -l
total 24
-rw-rw-r-- 1 frederic frederic   30 May  9 11:11 __init__.py
-rw-rw-r-- 1 frederic frederic  191 May  9 11:11 manifest.json
drwxr-xr-x 2 root     root     4096 May  9 11:18 __pycache__
-rw-rw-r-- 1 frederic frederic 2803 May  9 11:11 README.md
-rw-rw-r-- 1 frederic frederic 4485 May  9 11:11 sensor.py

Configured my Home Assistant:

homeassistant:
  packages: !include_dir_named packages
  whitelist_external_dirs:
    - /config/www
    - /config/files-fred
  allowlist_external_dirs:
    - /config/files-fred/downloads/energy

sensor:
  - platform: pdf
    name: "Test PDF Sensor"
    file_path: /config/files-fred/downloads/energy/ipsum.pdf

Where the pdf file is a plain text, ipsum.pdf

No "Test PDF Sensor" to be found anywhere

Feedback in log:

Logger: homeassistant.components.sensor
Source: custom_components/ha_pdf/sensor.py:117
Integration: Capteur (documentation, issues)
First occurred: 11:30:27 (1 occurrences)
Last logged: 11:30:27

pdf: Error on device update!
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/entity_platform.py", line 521, in _async_add_entity
    await entity.async_device_update(warning=False)
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 784, in async_device_update
    await coro
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/config/custom_components/ha_pdf/sensor.py", line 117, in update
    text = page.extractText()
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_page.py", line 1899, in extractText
    deprecation_with_replacement("extractText", "extract_text", "3.0.0")
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_utils.py", line 369, in deprecation_with_replacement
    deprecation(DEPR_MSG_HAPPENED.format(old_name, removed_in, new_name))
  File "/usr/local/lib/python3.10/site-packages/PyPDF2/_utils.py", line 351, in deprecation
    raise DeprecationError(msg)
PyPDF2.errors.DeprecationError: extractText is deprecated and was removed in PyPDF2 3.0.0. Use extract_text instead.

Anything I am doing wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.