aadityanair / projectnephos Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 0.0 260 KB

GSOC 2018 Project: Automate recording and uploading TV Channels to cloud.

License: GNU General Public License v2.0

Python 100.00%

projectnephos's Introduction

ProjectNephos

GSOC 2018 Project: Automate recording and uploading TV Channels to cloud.

Introduction

One of the function of the RedHen Organisation is to record and archive Television streams they receive for future research. Project Nephos is an effort by CCExtractor to automate the entire process. Archiving is done by compressing and uploading to Google Drive. In addition to downloading and archiving, Project Nephos provides the following functionalities:

Tagging of videos.
Searching archived videos.
Sharing videos with other entities

Installation

Using PyPI

Not yet implemented. Will be delivered with v1.0.

Cloning from source

git clone https://github.com/AadityaNair/ProjectNephos.git
pip install ./ProjectNephos

Usage

Below is how you would manually use Nephos to perform actions manually. This requires the config file to be present. More information on the config file in the Configuration section.

Uploading files

nephos upload <filename>

Searching

nephos search --name <name> --tags <tag1> <tag2> ... --do_and

Search for files with <name> and/or tags <tag1> <tag2> .... The and/or part will be decided by the do_and parameter. If specified, all parameters (name, tags) will be joined by an AND i.e it will search for " AND AND ..."
If not, ANDs will be replaced by ORs.

Atleast one of --name and --tags is required.

Tagging

nephos tag --for_name <name> --add_tags <tag1> <tag2> ...

This searches for all instances that contain <name> and for each of them, add the provided tags.

Processing

nephos process <input_file> <output_file>

Converts the input file to output file. The formats are guessed by their extensions.

Permissions

Share uploaded videos with people based on the video tags.

nephos permission add --for_tags <tag1> <tag2> --share_with <email>

This command is persistent. This means that all future videos with the tag will also be shared. To avoid this action pass --not_persistent to the command.

Note, The tags provided follow the OR semantics. i.e. in the above example, every file with the tag tag1 OR tag2 will be shared.

To view all permissions,

nephos permission list

More information can be found for each sub-command by using the --help option after the sub-command

Automation

For the most part you want to just specify what to record and when leave Nephos at it. For that:

Add channels

Add channel to specify where to download stuff from

nephos channel add --name 'CNN' --ip_string '1.2.3.4:5678'

Note that the name should be unique for each channel.

To view added channels.

nephos channel list

Add job.

Specify when to download other post download options.

nephos job add --name <jobname> --channel <channel> --start <starttime> --duration <length> \
               --upload --convert_to <format> --tag <tag1> <tag2>

Following are mandatory arguments:
--name is the name of the job. This should be unique for each job.
--channel is the name of the associated channel. This channel should have already been added by the channel add subcommand.
--start is the start time of the job written in the popular cron format. For more info on the format go here. This was used as an reference.
--duration is how long you want to record. This is provided in minutes.

Rest are optional arguments: --upload instructs nephos to upload the file to Google Drive. This will most likely be the default case in the future versions. In such a case, this option will be removed.
--convert_to makes so that the downloaded file is converted to the provided format before being uploaded.
--tag tags the uploaded file with the provided tags.

Note that --tag is dependent providing the --upload option. If it not provided --tag is a NOOP.

TV Listings

Nephos also has a crude API that supports TV listings.

nephos schedule add --name <program_name> --channel <channel> --start <starttime> --duration <length> --tags <tag1> <tag2>

This syntax is pretty much exactly the same as for the job add above. The tags are associated with the program. This allows for a separate syntax to add a job:

nephos job add --name <jobname> --program_tags <tag1> <tag2> .. \
               --upload --convert_to <format> --tag <tag1> <tag2>

This will find all programs with any of the provided tags and add them as jobs.

Initialise Server

This starts the orchestration server which is responsible for the record -> process -> upload pipeline. This will also create all the relevant directories and perform OAuth with google drive, if not done already.

nephos init

Currently, if a job is added after the server is started, it will not be picked up by the server. So, make sure you add all the jobs before starting the server. This will be fixed in a later version.

Configuration

All sorts of files Project Nephos creates can be found in ~/.nephos/. Of particular use is the config.ini file there. It contains all sorts of configuration information about nephos. A default one will be created for you when you run init.

For more information you should look at the wiki.It will be updated frequently.

projectnephos's People

Contributors

Stargazers

Watchers

projectnephos's Issues

Pretty print output

Currently all output of the tool is bare minimal and lack luster. we can improve that quite a lot.

Add remove operations

Currently, no remove operations exist for channel, job and permission.
Add them.

Catch bad channels.

Some channels may be not working or their streams have corrupted data that cannot be read/processed/uploaded.
We should be able catch such channels

Multiple storage backends

This requires few things to be done.

All handlers need only call on one object.
this object deals with all the individual backends.
(maintaiability) This object deals with all the logging for the backends.

Reliable Jobs

Currently once an upload pipeline has started, any unexpected crash will abort the whole process forever.

Use databases to track which part of the job has been completed and uncouple jobs accordingly.

Make config singleton

That would probably save us the headache of passing the config everytime it is needed and even allows config to be updated at runtime without any object passing

Validate user input.

We currently assume that all inputs are correct and is directly pluggable to the program. That will never be the case.
We have two starting points.

The values provided in the config.
Options provided in the commandline.

This can be more than just format and structural validation. IP addresses can be validated to be actually pingable.

Resumable uploads

GDrive supports resumable uploads. This will be useful for large files.

Support more options to specify job start time.

As of now, the only way to specify the job start time is through a string in cron format. That may not be the most intuitive way of input for most people. Add a means to specify time in a simpler way.

Channel specific tags.

We should allow for tags that all jobs from a channel will have.
This allows us to not repeat tags for each job on a channel.

Custom preprocessing script.

The current preprocessing script just converts from one format to another.
We could have an option for running a custom script. The format could be as follows:
command <paramaters> {input} {output}.
This is a literal string. the {input} and _{output} _ can be replaced by nephos itself.

This setting can be per job/channel or system-wide if need be.

[Permissions]: provide AND semantics

Currently, the command,

nephos permission add --for_tags <tag1> <tag2> --share_with <email>

shares all files with tag1 OR tag2 with the email. We would love to have an option for tag1 AND tag2.

When upload is not specified

When --upload is not specified for a job, a lot of random things may happen in the pipeline.
Fix it by probably adding a simple if.

Maintenance tasks.

A lot of maintenance tasks need to run.

Check disk space and stop downloading tasks.
Delete uploaded files.
...etc.

Add jobs post server start.

Currently we use a blocking scheduler from apscheduler. That limits us to only add jobs before the scheduler has to start. This is pretty big limitation.

Better sub-subcommands.

The subcommands in job, channel and permission are not the best looking.

Use argparse.add_subparser and reimplement them better.

LOT more error handling

There are a lot of places where there is no actual error handling. One can try various inputs and see.
We would like to identify where things can go wrong and raise proper errors there.

Add update operation

channel, job and `permission could also do with an update operation

Logging

The logging isn't as extensive as it can be. A more extensive logging would be nice for debugging.

Fit the client secret somewhere in the code

Better initialisation

The current initialisation is not good enough:

Do oauth during init.
Better handler custom config files
Autocreate folders

[handlers]: Consistent run and execute_command

run and execute_command are two interfaces for each handler. The former is for command-line usage while the latter is used when using programatically.

Technically both should perform the same function with run calling execute_command and parsing/pretty printing it's output. But right now, that is not the case for all of them. Most of the execute_command implement the functionality for the orchestration.

Either have it consistent or drop it as a common interface.

Standardise all scheduled tasks

Currently everything is an hodgepodge.

Standard return codes.
Standard input

Drive backend

The current drive backend does not handle the case when the file is actually in the trash or in a team drive.

Batch requests.

Currently we make a single request for each of them. We should use the batching feature that is available.
https://developers.google.com/resources/api-libraries/documentation/drive/v3/python/latest/index.html#new_batch_http_request

Initialise handlers only when called

Currently, all handlers are initialized everytime the CLI application is called. This will cause significant costs later. Initialize handlers only when their specific subcommand is called.

Static type checking

Use mypy and Travis to also perform static checking

Update items.

Currently there is no update for channels, job and permissions.
Add them.

Add some badges

They should include:

coverage
build status
standards comlpliance
etc

Unable to catch sqlalchemy integrity errors.

Something is wrong with it. needs more investigation.
Basic code:

           try:
                self.db.add_job(
                    args.name,
                    args.channel,
                    args.start,
                    args.duration,
                    args.upload,
                    args.convert_to,
                    args.tags,
                )
            except (IntegrityError):
                if len(self.db.get_channels(name=args.channel)) == 0:
                    logger.critical(