GithubHelp home page GithubHelp logo

holgerhees / smartserver Goto Github PK

View Code? Open in Web Editor NEW
25.0 9.0 7.0 39.09 MB

SmartHome Server deployment setup

Home Page: http://www.intranet-of-things.com/smarthome/infrastructure/server/setup/

License: GNU General Public License v3.0

JavaScript 14.79% Shell 2.72% PHP 6.16% CSS 3.55% Perl 0.20% Python 42.43% C 0.18% Dockerfile 0.93% HTML 28.74% C++ 0.02% Pawn 0.09% FLUX 0.01% Smarty 0.01% NASL 0.08% BitBake 0.09%
vagrant ansible smarthome linux server alma rockylinux suse ubuntu homeserver

smartserver's Issues

No VLANs in my librenms

I'm trying to solve the alerts that I have. With this issue I would like to let you now the special situation - maybe you consider and make the code more robust.

The alert is System service service state with prometheus expression system_service_state{job="system_service",hostname=""} == 0 which gets triggered because in my systems librenms has no VLANs (Don't know if it's wrong configuration or my switch - TP-Link - does not provide those information via SNMP).

The call returns 404:

# curl -H 'X-Auth-Token: XXXXXX' http://librenms:8000/api/v0/resources/vlans
{
    "status": "error",
    "message": "VLANs do not exist"
}

If I modify the code to print the stacktrace I see:

Traceback (most recent call last):
  File "/opt/system_service/lib/scanner/handler/librenms.py", line 45, in _run
    self._processLibreNMS()
  File "/opt/system_service/lib/scanner/handler/librenms.py", line 80, in _processLibreNMS
    self._processVLANs()
  File "/opt/system_service/lib/scanner/handler/librenms.py", line 160, in _processVLANs
    _vlan_json = self._get("resources/vlans")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/system_service/lib/scanner/handler/librenms.py", line 319, in _get
    raise NetworkException("Got wrong response status code: {}".format(r.status_code), self.config.startup_error_timeout if not self._isInitialized() else self.config.remote_suspend_timeout)
lib.scanner.handler.librenms.NetworkException: Got wrong response status code: 404

For now I commented the call to _processVLANs and _processFDP and the alert is gone.

I don't know if this situation may appear to others (and code need to be changed to consider this) or I need to configure librenms /switch to report VLANs.

include_tasks instead of import_tasks

Hello,

By default on OpenSuse 15,5 the ansible version is 2.11.12 and since 2.8 there is a change that should be considered.

I receive

The requested handler 'refresh prometheus' was not found in either the main handlers list nor in the listening handlers list 

and I suppose that all import_tasks like import_tasks: roles/prometheus/shared/add_config.yml should be replaced by include_tasks: like include_tasks roles/prometheus/shared/add_config.yml.

In documentation you mentioned ansible 2.10.7 - I guess is affected by this change.

Role hardware_smartd does not support NVMe devices

In my system I have a NVMe drive:

# smartctl --scan
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/nvme0 -d nvme # /dev/nvme0, NVMe device

The command smartctl --scan | grep -oP "^[A-z/]+" returns (please see the missing 0 from /dev/nvme0):

/dev/sda
/dev/sdb
/dev/nvme

If I modify the comand like smartctl --scan | grep -oP "^[A-z0-9/]+" and change the hardware_smartd\templates\smartd.conf to guess the device type (eg. using -d auto) then the issue is fixed in my system.

These changes are in 1beaf45 - maybe you want to incorporate these changes.

Smartd temperature is too high (HDD/SSD) not restrictive enough

Prometheus has some internal metrics like ALERTS_FOR_STATE that contains the timestamp when an alert was triggered. This metric contains all alert labels and one additional 'alertname'.

I think the expression for this alert is not restrictive enough because after a alert is triggered it will not be removed any more (not even after the condition that triggered the alert is false).

I propose the expression to be changed to (exclude timeseries ALERTS_FOR_STATE - eg. the ones without label alertname):
{alertname="",chart=~"smartd_log.*",family="temperature",job="netdata"} > 50

I'm not sure if this should be applied to other alerts...

The situation in my system after the alert is ended:
image

Some suggestions

I love this project, just what I was looking for. A couple of suggestions:

It would be great to have some kind of tree/graph showing the various dependencies of the various subsystems. Then users could known which parts could be safely dropped or modified without side effects.

An overview of how Vagrant and Ansible interact to build out the system would help us understand how it all comes together.

Provide some simple FAQ/tutorial sections, something like:

  • How to add a printer/device
  • How to add a new category in the top heading

Need some way to handle self-signed certs in demo mode

Recent browser updates are not allowing self-signed certs to be accepted and there doesn't seem to be any good instructions on how to overcome this. Two possible approaches:

  1. provide step-by-step instructions on how to configure the latest releases of Firefox/Chrome to accept self-signed certs from Smartserver.

or

  1. provide a mode switch in Smartserver to allow it to operate without self-signed certificates.

I really hope this can be fixed, I'd like to start developing with Smartserver.

Documentation question/suggestion

I was using parts of the repo to help build my own system. But now I'd like to also use the staging/production mix and migrate my ansible roles into this.

It's not clear in https://github.com/HolgerHees/smartserver/wiki/Setup:-Create#create-your-own-setup how to:

  • (from laptop) vagrant --config=demo --os=ubuntu up --provision works and provisions to the VM running locally.
  • (from laptop) sudo ansible-playbook -i config/demo/server.ini server.yml will only run if I add the production IP to config/demo/server.ini.

Point 5: copying files is unclear - copy config/demo/ to the production server?

Perhaps an example where we:

  1. mention adding production ip to server.ini,
  2. run with (from laptop) ansible-playbook -i config/demo/server.ini --extra-vars "target=production" server.yml

At least this would be my thinking after playing around with your setup very briefly. I might be missing something so take all of this with a grain of salt.

system_update_check crash when system is recently updated

Regarding

result = subprocess.run([ "/usr/bin/zypper ps -s" ], shell=True, check=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, cwd=None )

On my system (recently updated) the output of command

/usr/bin/zypper ps -s

is:

No processes using deleted files found.

No core libraries or services have been updated since the last system boot.
Reboot is probably not necessary.

The following exception is thrown:

Traceback (most recent call last):
  File "/opt/update_daemon/system_update_check", line 31, in <module>
    repo = plugin.Repository()
  File "/opt/update_daemon/plugins/os/opensuse.py", line 19, in __init__
    self.needs_restart = m.group(0) is not None
AttributeError: 'NoneType' object has no attribute 'group'

For the moment I fixed this by this additional check:

self.needs_restart = False
if m is not None:
    self.needs_restart = m.group(0) is not None

but I'm not sure the logic is correct (needs_restart = False) as long the output says "Reboot is probably not necessary."

Docker using local DNS

My ISP offer a parental control service (that I have it enabled). When this feature is enabled, I guess all DNS requests to other servers (like Google DNS 8.8.8.8) than local one are drop.

With parental control enabled on ISP side:

jupiter:~/git/smartserver # nslookup www.google.com 8.8.8.8
;; connection timed out; no servers could be reached

Without parental control enabled:

jupiter:~ # nslookup www.google.com 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
Name:   www.google.com
Address: 142.250.180.228
Name:   www.google.com
Address: 2a00:1450:400d:80c::2004

Sometimes the smartserver ansible scripts fail because of that. For example, now, when you updated the alpine version the creation of dnsmasq container failed. The names are not resolved because the 8.8.8.8 (that is used by docker) on 53 port is not accessible.

#5 [2/2] RUN apk --no-cache add dnsmasq tzdata\n
#5 0.081 fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/main/x86_64/APKINDEX.tar.gz\n
#5 5.085 WARNING: fetching https://dl-cdn.alpinelinux.org/alpine/v3.20/main: temporary error (try again later)\n
#5 5.085 fetch https://dl-cdn.alpinelinux.org/alpine/v3.20/community/x86_64/APKINDEX.tar.gz\n
#5 10.09 WARNING: fetching https://dl-cdn.alpinelinux.org/alpine/v3.20/community: temporary error (try again later)\n
#5 10.09 ERROR: unable to select packages:\n
#5 10.09   dnsmasq (no such package):\n
#5 10.09     required by: world[dnsmasq]\n
#5 10.09   tzdata (no such package):\n
#5 10.09     required by: world[tzdata]\n

After I disabled the parental control on ISP then the ansible scripts works as expected.

It's nothing critical, but I just let you know - maybe there is an easy fix.

For the moment, my workarounds are:

  1. to disable the parental control on ISP (I want to avoid that);
  2. before running ansible script to modify the /etc/resolver.conf - replace 127.0.0.1 with gateway IP (where a DNS is running, which forwards to ISP DNS)

trafficblocker blocklist KeyError

Not sure if it's a defect or wrong configuration in my side, anyway I will let you know.

I'm getting the following error:

[ERROR] - [m_service.lib.influxdb:66] - Traceback (most recent call last):
  File "/etc/system_service/lib/influxdb.py", line 62, in run
    messurements += callback()
                    ^^^^^^^^^^
  File "/etc/system_service/lib/trafficwatcher/trafficblocker/trafficblocker.py", line 264, in getMessurements
    messurements.append("trafficblocker,extern_ip={},blocking_state={},blocking_reason={},blocking_list={},blocking_count={} value=\"{}\"".format(ip, data["state"], data["reason"], data["blocklist"], data["count"], data["last"]))
                                                                                                                                                                                     ~~~~^^^^^^^^^^^^^
KeyError: 'blocklist'

If I delete the /dataDisk/var/lib/system_service/trafficblocker.json file (version 3) the error is gone, but the file is empty for the moment. Previously, the file contains entries like (all missing blocklist):

"146.190.40.112": {
    "count": 1,
    "created": 1692759673.4107,
    "details": "scanning",
    "last": 1692759617.3242,
    "reason": "apache",
    "state": "blocked",
    "type": "unknown",
    "updated": 1692759673.4107
},

Edge case - image_proxy.php never ends

Hello Holger,

I've tried to use your recent changes to gallery and I think I found a edge case that may lead to image_proxy.php to never ends.

In my environment (the cameras are not configured to accept Basic auth or the format returned is not recognized by Imagick) seems that curl_exec of the camera snapshot url may throw an exception, and then in subsequent calls the apcu_delete( $url . ":fetch" ); never called and the loop never exit. The workaround is to restart the php container to clean the cache.

PHP logs looks like (with some of your debugging statements uncommented):

NOTICE: PHP message: fetch http://hostname/ipcamera/cam01/ipcamera.jpg
NOTICE: PHP message: 0.0033631324768066
NOTICE: PHP message: fetched size 282
NOTICE: PHP message: calculated size 282
NOTICE: PHP message: PHP Fatal error:  Uncaught ImagickException: no decode delegate for this image format `' @ error/blob.c/BlobToImage/363 in /dataDisk/htdocs/gallery/image_proxy.php:148
Stack trace:
#0 /dataDisk/htdocs/gallery/image_proxy.php(148): Imagick->readImageBlob()
#1 /dataDisk/htdocs/gallery/image_proxy.php(101): scaleImage()
#2 /dataDisk/htdocs/gallery/image_proxy.php(65): fetchUrl()
#3 /dataDisk/htdocs/gallery/image_proxy.php(14): getData()
#4 {main}
  thrown in /dataDisk/htdocs/gallery/image_proxy.php on line 148

Maybe you consider fixing this by handling such exception and cleanup the cache.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.