GithubHelp home page GithubHelp logo

aadityanair / projectnephos Goto Github PK

View Code? Open in Web Editor NEW
3.0 3.0 0.0 260 KB

GSOC 2018 Project: Automate recording and uploading TV Channels to cloud.

License: GNU General Public License v2.0

Python 100.00%

projectnephos's Introduction

ProjectNephos

GSOC 2018 Project: Automate recording and uploading TV Channels to cloud.

Code style: black Build Status Coverage Status Requirements Status CodeFactor

Introduction

One of the function of the RedHen Organisation is to record and archive Television streams they receive for future research. Project Nephos is an effort by CCExtractor to automate the entire process. Archiving is done by compressing and uploading to Google Drive. In addition to downloading and archiving, Project Nephos provides the following functionalities:

  1. Tagging of videos.
  2. Searching archived videos.
  3. Sharing videos with other entities

Installation

Using PyPI

Not yet implemented. Will be delivered with v1.0.

Cloning from source

git clone https://github.com/AadityaNair/ProjectNephos.git
pip install ./ProjectNephos

Usage

Below is how you would manually use Nephos to perform actions manually. This requires the config file to be present. More information on the config file in the Configuration section.

Uploading files

nephos upload <filename>

Searching

nephos search --name <name> --tags <tag1> <tag2> ... --do_and

Search for files with <name> and/or tags <tag1> <tag2> .... The and/or part will be decided by the do_and parameter. If specified, all parameters (name, tags) will be joined by an AND i.e it will search for " AND AND ..."
If not, ANDs will be replaced by ORs.

Atleast one of --name and --tags is required.

Tagging

nephos tag --for_name <name> --add_tags <tag1> <tag2> ...

This searches for all instances that contain <name> and for each of them, add the provided tags.

Processing

nephos process <input_file> <output_file>

Converts the input file to output file. The formats are guessed by their extensions.

Permissions

Share uploaded videos with people based on the video tags.

nephos permission add --for_tags <tag1> <tag2> --share_with <email>

This command is persistent. This means that all future videos with the tag will also be shared. To avoid this action pass --not_persistent to the command.

Note, The tags provided follow the OR semantics. i.e. in the above example, every file with the tag tag1 OR tag2 will be shared.

To view all permissions,

nephos permission list

More information can be found for each sub-command by using the --help option after the sub-command

Automation

For the most part you want to just specify what to record and when leave Nephos at it. For that:

Add channels

Add channel to specify where to download stuff from

nephos channel add --name 'CNN' --ip_string '1.2.3.4:5678'

Note that the name should be unique for each channel.

To view added channels.

nephos channel list

Add job.

Specify when to download other post download options.

nephos job add --name <jobname> --channel <channel> --start <starttime> --duration <length> \
               --upload --convert_to <format> --tag <tag1> <tag2>

Following are mandatory arguments:
--name is the name of the job. This should be unique for each job.
--channel is the name of the associated channel. This channel should have already been added by the channel add subcommand.
--start is the start time of the job written in the popular cron format. For more info on the format go here. This was used as an reference.
--duration is how long you want to record. This is provided in minutes.

Rest are optional arguments: --upload instructs nephos to upload the file to Google Drive. This will most likely be the default case in the future versions. In such a case, this option will be removed.
--convert_to makes so that the downloaded file is converted to the provided format before being uploaded.
--tag tags the uploaded file with the provided tags.

Note that --tag is dependent providing the --upload option. If it not provided --tag is a NOOP.

TV Listings

Nephos also has a crude API that supports TV listings.

nephos schedule add --name <program_name> --channel <channel> --start <starttime> --duration <length> --tags <tag1> <tag2>

This syntax is pretty much exactly the same as for the job add above. The tags are associated with the program. This allows for a separate syntax to add a job:

nephos job add --name <jobname> --program_tags <tag1> <tag2> .. \
               --upload --convert_to <format> --tag <tag1> <tag2>

This will find all programs with any of the provided tags and add them as jobs.

Initialise Server

This starts the orchestration server which is responsible for the record -> process -> upload pipeline. This will also create all the relevant directories and perform OAuth with google drive, if not done already.

nephos init

Currently, if a job is added after the server is started, it will not be picked up by the server. So, make sure you add all the jobs before starting the server. This will be fixed in a later version.

Configuration

All sorts of files Project Nephos creates can be found in ~/.nephos/. Of particular use is the config.ini file there. It contains all sorts of configuration information about nephos. A default one will be created for you when you run init.

For more information you should look at the wiki.It will be updated frequently.

projectnephos's People

Contributors

aadityanair avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

projectnephos's Issues

Add jobs post server start.

Currently we use a blocking scheduler from apscheduler. That limits us to only add jobs before the scheduler has to start. This is pretty big limitation.

[Permissions]: provide AND semantics

Currently, the command,

nephos permission add --for_tags <tag1> <tag2> --share_with <email>

shares all files with tag1 OR tag2 with the email. We would love to have an option for tag1 AND tag2.

Introduce timezones.

Having timezones will help in adding jobs on channels that don't actually run on the local timezone.
This can be done in three steps:

  1. Add default timezone for each channel.
  2. Ability to specify timezones while adding jobs.
  3. Make the scheduler use the provided timezone.

[handlers]: Consistent run and execute_command

run and execute_command are two interfaces for each handler. The former is for command-line usage while the latter is used when using programatically.

Technically both should perform the same function with run calling execute_command and parsing/pretty printing it's output. But right now, that is not the case for all of them. Most of the execute_command implement the functionality for the orchestration.

Either have it consistent or drop it as a common interface.

Custom preprocessing script.

The current preprocessing script just converts from one format to another.
We could have an option for running a custom script. The format could be as follows:
command <paramaters> {input} {output}.
This is a literal string. the {input} and _{output} _ can be replaced by nephos itself.

This setting can be per job/channel or system-wide if need be.

Support nested folders.

We currently only support one level of folders. Would be nice for structuring to have folders over folders.

Better initialisation

The current initialisation is not good enough:

  1. Do oauth during init.
  2. Better handler custom config files
  3. Autocreate folders

Make config singleton

That would probably save us the headache of passing the config everytime it is needed and even allows config to be updated at runtime without any object passing

Update items.

Currently there is no update for channels, job and permissions.
Add them.

Maintenance tasks.

A lot of maintenance tasks need to run.

  1. Check disk space and stop downloading tasks.
  2. Delete uploaded files.
  3. ...etc.

Initialise handlers only when called

Currently, all handlers are initialized everytime the CLI application is called. This will cause significant costs later. Initialize handlers only when their specific subcommand is called.

Pretty print output

Currently all output of the tool is bare minimal and lack luster. we can improve that quite a lot.

When upload is not specified

When --upload is not specified for a job, a lot of random things may happen in the pipeline.
Fix it by probably adding a simple if.

Drive backend

The current drive backend does not handle the case when the file is actually in the trash or in a team drive.

Support more options to specify job start time.

As of now, the only way to specify the job start time is through a string in cron format. That may not be the most intuitive way of input for most people. Add a means to specify time in a simpler way.

Add some badges

They should include:

  1. coverage
  2. build status
  3. standards comlpliance
  4. etc

Better sub-subcommands.

The subcommands in job, channel and permission are not the best looking.

Use argparse.add_subparser and reimplement them better.

Ignore errors

Add option to ignore errors and continue. We sure will log them.

Resumable uploads

GDrive supports resumable uploads. This will be useful for large files.

Channel specific tags.

We should allow for tags that all jobs from a channel will have.
This allows us to not repeat tags for each job on a channel.

Test coverage

Currently we have a very basic test coverage, mostly done just to have coveralls actually look at it.
Add a lot more tests everywhere

Reliable Jobs

Currently once an upload pipeline has started, any unexpected crash will abort the whole process forever.

Use databases to track which part of the job has been completed and uncouple jobs accordingly.

Validate user input.

We currently assume that all inputs are correct and is directly pluggable to the program. That will never be the case.
We have two starting points.

  1. The values provided in the config.
  2. Options provided in the commandline.

This can be more than just format and structural validation. IP addresses can be validated to be actually pingable.

Catch bad channels.

Some channels may be not working or their streams have corrupted data that cannot be read/processed/uploaded.
We should be able catch such channels

LOT more error handling

There are a lot of places where there is no actual error handling. One can try various inputs and see.
We would like to identify where things can go wrong and raise proper errors there.

Multiple storage backends

This requires few things to be done.

  1. All handlers need only call on one object.
  2. this object deals with all the individual backends.
  3. (maintaiability) This object deals with all the logging for the backends.

Improve DB interface.

The current DB interface leaves a lot to be desired.

  1. Handle transaction failures.
  2. No need to commit after every action.
  3. Standardised return values for each action
  4. Close session after each use.

Unable to catch sqlalchemy integrity errors.

Something is wrong with it. needs more investigation.
Basic code:

           try:
                self.db.add_job(
                    args.name,
                    args.channel,
                    args.start,
                    args.duration,
                    args.upload,
                    args.convert_to,
                    args.tags,
                )
            except (IntegrityError):
                if len(self.db.get_channels(name=args.channel)) == 0:
                    logger.critical(
 

Logging

The logging isn't as extensive as it can be. A more extensive logging would be nice for debugging.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.