GithubHelp home page GithubHelp logo

rajatkb / conference-notify Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 22.0 3.57 MB

Conference-Notify is an open source web based application that will aggregate conference information and allow users to search and create recuring reminders and feed for themselves

License: GNU General Public License v3.0

Python 24.43% TypeScript 24.31% JavaScript 0.91% HTML 49.05% CSS 1.29%
research notifier-service conference nodejs python webscraping pymongo wikicfp finding-relevant-conferences recurring-notifiers

conference-notify's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

conference-notify's Issues

Add CODE_OF_CONDUCT.md

Is your feature request related to a problem? Please describe.
No CODE_OF_CONDUCT.md file is present.

Describe the solution you'd like
@rajatkb As a GSSOC participant I would like to add a CODE_OF_CONDUCT.md file in this project. Using this as a reference.

Additional context

Screenshots

code of conduct1
code of conduct2

[Base] design suggestion for Search-Service

The search service needs to cater to user searching information over the data stored by the Scrapper-Service.
** REQUIREMENTS **

  • Should be able to expose simple search API over Elastic Search
  • Would be great if the solution could involve something that minimizes the data to be stored and indexed in Elastic Search but still give relevant results.

[Improve] refactor app.py

Requirement

  • Clean up the app.py bu moving argument parsing related code to a separate class
    arguments.py -> Arguments in utility
  • Move the configuration parsing related code to a sepparate class
    configuration.py -> Configuration
    in utility.

Where to look @

  • app.py

Code change located

  • app.py
  • utlity module

[Feature] Email Subscription system

Create a new Controller and endpoint to trigger a mail to the respective, Email provided through the rest end point. So a user would be able to hit the rest end point /subscribe/[email protected] with a payload of filters and will get updates of whenever new conferences are entered into the system or updates to existing one's are done.

Describe the solution you'd like
One hitting the rest endpoint the controller would register a new email along with it's respective filters. There will be a new listener attached to changeStream events and the respective event data will be analyzed based on filters for emails. Then a notification for these mails will be sent.

Additional context

REQUIREMENTS

  • Create a Proper extensible Mongo Schema for representing an email. (Mongo Model)
  • Create proper filter parameters for the payload of registering an email. ( Controller / Service)
  • Create a new Listener for filtering out emails for individual event of new Conference data insertion/updates. Leverage mongodb query to get the emails filtered. (Listener)

Add ISSUE_TEMPLATE

@rajatkb @sagar-sehgal I am a GSSOC participant and I would like to add an issue template folder with pull request template and a new issue template.
updated pull request

[Improve ] introduce multiprocessing in main.py for Scrapper-Service

The current implementation works in a sequential manner and is capable of using only one core of a multicore system. Something that should be leveraged.
REQUIREMENT

  • The Scrapper though may or may not do it's processed using threads or async should not affect launching the various Scrappers using MultiProcessing

[Feat.] insert to mongo db on having greater datestamp for deadline, trigger a rest call for same

The Database class currently does not checks for the deadline constraint. It may update the deadline of an mongo document from any of the scrapper plugin , and may lead to inconsistent or old date. Insertion should be avoided in such cases.

REQUIREMENT

  • Figure out a a way to do the deadline date check.
  • Achieve this without doing a (search -> bring data to application memory -> insert)
  • Doing the above will only lead to more latency ( if used locks to preserve critical section ) for single insertion. Bringing in problems such as inconsistent writes when the scrappers are made parallel.

Design Suggestion for Notifier-Service

Suggest a tech stack and a possible solution of how we can build a notification service based on the data collected.

Requirement

  • The end user application should be able to subscribe on the end point given by the service
  • Should also be able to cater to browser service worker hooks for notification purpose
  • Try to avoid user authentication for this service. We are not focusing on user tracking

[Feat.] implement routines for Conference service in Notifier-service application

This issue is dependent on #34 , #45 #46 , so if someone intends to address this will have to pick and address that issue first. The issue is also linked with #36 , given it will impact the performance of the response.

REQUIREMENTS
-> complete implementation of conference.ts

Where to look
-> src/services/conference.ts

PS: In order to implement them just follow the comments given

[Bug] logger does not accepts unicode character, throws error , breaks the main thread

Describe the bug
Unicode character not getting accepted in Logger

To Reproduce
Steps to reproduce the behavior:

  1. Run the app.py
  2. Wait for some time , eventually it will encounter unicode character.and will display unhandled error on the main thread.
  3. This blocks the main thread and hard to close without a kill command

Expected behavior
Unicode character should not raise such issue. Logs will include data from the web. So unicode should be expected.

Desktop (please complete the following information):

  • Python: Python 3.7
  • Logger : logging
  • Version [e.g. 22]

Additional context
REQUIREMENT

  • Fix bug
  • Write unit test to check the issue.

[Bug] Scrapper service when run on windows machine stop after running few hours

The service is written in Python so that should be taken in consideration.

  • The process after a few hours of execution stop printing logs. Although by that time a huge amount of data is collected at MongoDB so updates are frequent.
  • So without logs and any new insertion update from MongoDB it's difficult to detect what is going wrong.
  • The process, however, does not crashes evident from task manager.
  • No resource crunch. It's 16GB RAM machine with i7 4770. The CPU utilization is at minimal and so as to ram.

Would post a screenshot once this happens again

[Test] add unit test for Conference datamodel

The Conference data model is to be used by other contributors for stashing data from the scrappers inside the MongoDB. The class should be tested for possible edge cases which can lead to fault data insertion in the database.

Requirement

  • Unit test for Conference data.
    Extra
  • If possible reorganize the test scripts under single folder, with running instruction.

[Docs] update Scrapper-Service Readme.md with Datamodel Schema used.

The data model schemas are missing from the readme. This needs to be included in order to help others work with their mongo instances and query the data.

REQUIREMENTS

  1. Readme doc inside the Scrapper-Service , not the root one should contain a schema of the mongodb data

  2. make any necessary update to the readme that will help people understand about what is indexed, and how the insertion in happening for multiple scrappers.

[Improve] Update the main README.md, CONTRIBUTING.md and Scrapper-Service/README.md

Some of the improvements that must be added to the main README.md file.

  • In the Prerequisites section, for the various tools like Mongo DB, Angular, etc., please provide a link to the installation instructions for Windows and Linux systems.
  • The /Scrapper-Service folder has requirements.txt file. Please add instructions for creating the virtual environment and installing the /Scrapper-Service/requirements.txt.
  • For Deployment of Scrapper-Service section, in the instructions, change main.py to app.py.
  • For Deployment section, separate all the commands into different blocks, instead of putting all of them in one block. Further, please remove the >> symbol in the beginning, because currently, it becomes difficult to copy the complete command in a double click.
  • Also, similar work has to be done in the ATTENTION section of CONTRIBUTING.md file for separating all the commands into different blocks, instead of putting all of them in one block.
  • Add instructions on how to run the Scraper Service in different modes in the /Scrapper-Service/README.md.

[Feature] Log in system with JWT/Session

The system currently needs a login system, which would allow users to register and curate a feed through email or through the frontend application according to themselves. Including setting up reminders of their choice,

Describe the solution you'd like
Use Passport or custom implemented solution (Should be properly scrutinised for its security), for enabling user login and creating user identity when creating routes.

Additional context

  • Starting with creating a Mongo Schema for a user along with a model to interact with it (look in Model folder)
  • Implement a service that can act as the auth service.
  • Implement middleware or use passport to inject user session data or validate user using jwt.
    NOTE : While hashing password make sure to take note of crypto package's synchronous nature. Do explore crypto-async once to see if it's a good fit for usage.

[Improve] add adaptive recuring non blocking timeout handling for Scrapper.py get_page method

Requirement

  • The get_page requirement is currently set to static manually set timeout duration. It would great to have an adaptive method get_page , which will not timeout because of few network stability issues but will also not wait indefinitely.

Where to look

  • Scrapper.py inside commons

Update: An implementation is provided for exponential wait time increase in utility package as AdaptiveRequest class. The issue is not resolved, as in case of fault network the system may end up raising its wait time indefinite, i.e leading back to the old problem of having no wait time for request.

[Improv.] feature suggestion for the Service

Currently, our application in development have been plan to have these features.

  1. Single source of information for all conferences out there, search capability through text

  2. Reminders and notification for following if subscribed

  • new conferences for categories
  • deadline extensions
  • change of conference timing (in times of pandemic like these days)
  1. Reminders can be sent through
  • emails
  • push notification to the user browser

But more exciting and usable features does not hurt. So I am opening this Issue so that people can lend their suggestion. If acceptable we can even create new issue and have them work on it. I am hoping this platform can become the Arxiv sanity of Conferences ๐Ÿ˜

[Improv.] introduce CORS to the Notifier-Service express application

Is your feature request related to a problem? Please describe.
Current version of the application does not have cors enabled. Which means down the road, frontend application (deployed and served from sepparate server origin ) will not be able to access our api. For this we will need CORS enabled.

Describe the solution you'd like
Include cors package and include that in express app , configure it to accept request from any origin.

REQUIREMENT

  1. In app.js of Notifier-Service include the cors package.

[Bug] categories in conference collection is getting duplicated over multiple runs.

Describe the bug
The categories field is getting populated with the same data repeatedly over multiple runs, in the conference collection.

To Reproduce
Steps to reproduce the behavior:

  1. Run the Scrapper-Service using python app.py -l debug -ls console --test True
    2.. Close it using Ctrl-C and run it again.
  2. Check conference categories data, you will see the conference data being repeated

Expected behaviour
There should be unique entries in the conference categories field.

Additional context
The bug is associated with insertion logic, need fix in that ASAP.

[Improv.] improve schema for current mongodb storage , especially for category indexing

The category field in the current schema is storing an array of values.

"categories" : [ "semantic web", "wireless", "web services", "internet" ],

This cannot be indexed as these are not unique across each entry in the database. We have some queries that specifically require to get entries corresponding to one category. Like

query: get all conference of category "wireless"

For such query, the complexity will rise to O(n) for the database. We need a solution to this problem.

[Feat.] Adding Continous Integration (CI) Tool

Since we are following Test Driven Development and some tests have already been added, it would be nice to integrate a Continous Integration tool like Travis CI or Github Actions.

This will help in checking whether a new PR will break the original code or not by running the project and running the tests of the application.

[Base] design suggestion for [Angular] front end application

Start the angular front end application project base. Start with creating new folder in the root/Application . Submit the PR request with your own branch commit as suggested in the contributing.md.

REQUIREMENT

  • Create a basic clean UI (Material compliant would be a plus)
  • Top: Search bar (Test with free rest endpoints)
  • Card view for all conferences
  • Single page event view for selection recurring reminder. We are trying to avoid user registration and rather have the application register itself to the backend so that the backend can serve the notification.

[Improve] change _id parameter for Conference model

The current value of _id is kept as the url of conference itself since the url identifies it uniquely. But the _id if kept as string will invoke a performance load. A better way of generating unique integers is required which will be base on string that are extracted. This is important since the _id is used for updating existing entries and has to be created form the url of a conference.

REQUIREMENT

  1. Find a way to generate a unique number from string (hash mechanism)
  2. Make sure it exploits some pattern in the data and not use md5 or sha directly as it will incur a debt of compute. Keep them as last option.

WHERE TO LOOK

  1. database/mdb.py module for Database implementation , (look at the implementation)
  2. datamodels/conference.py module for Conference. Check the implementation

Change Required @

  1. conference.py , localise the change here. or maybe take any hash fuction to utlity and use in this class

[Feat.] landing page design for user application

REQUIREMENT

  1. Landing page consisting of search barc / categories
  2. Some way to navigate to some list of Conference (may be clicking on search bar should open a top conference list with search bar on top)

[Feat.] Individual event details page layout.

On clicking any event card view navigate it on details pages of that event.

Here use proper router-based navigation, such that on when clicking the browser forward or backward button session is also mange.

[Bug] add close method to database and a process exit event handler for calling it.

Describe the bug
Currently the database connection is sustained even after the process is killed. With language like python we get destructors which can close the connection when interpreter is closed. But with Node.js we need to tap into the process exit event and close the connection.

Why closing is necessary. Because leaving connection is never safe, It creates unpredictable behaviour from both application and the database server-side.

Expected behavior
All connection should be closed down when the application receives a kill signal.

[Feat.] add new Scrapper for Guide2Reasearch

The current scrapper is collecting the information from wikicfp. One more such large aggregator of information is Guide2Research. They also provide journal impact factor and other information relevant to researchers. Follow the guidelines in the Scrapper-Service to add your own service.

[Improve] fix mongo indexing and update operation.

The current implementation indexes on url (unique) and deadline (sorting). But since multiple scrappers will be writing using the same interface how can we ensure only the latest updates of particular conference is put in the db (overwriting allowed). But overwriting is not possible with unique index and upsert:True argument. The implementation needs changing given that we avoid bringing in data to the application side for any sort of verification.

[Feat.] add FastCache class responsible for providing in memory cache

For large repeated queries whose results are not frequently changing, it's better to have them cached, than have them repeatedly evaluated. We can use node-cache , as a lightweight solution for caching.

REQUIREMENT

  • Implement class FastCache in services , which will use node-cache internally and facilitate usage

Where to look

  • src/services/ implement here

PS: Currenly admins are attending to the class design of it and reading up on doc. If anybody wants to pitch in can work on it. ๐Ÿ˜ƒ

[Feat.] add new stream filters like 'update'/'create'/'replace' etc for ConferenceStream

Is your feature request related to a problem? Please describe.
The current project already has changeStream observer setup against mongo collection. but the getStream() is a standard observer that emits all events. We want a few custom functions like getUpdateStream() , getInsertStream() etc to detect these individual events to attach listeners to,

Describe the solution you'd like
Implement

  • getInsertStream():Observable
  • getUpdateStream():Observable
  • getDeleteStream():Observable
    in src/interfaces/service/stream/conferenceStream and src/service/stream/conferenceStream

Additional context
Rxjs is already included in the environment so make sure to use that, for filter and map operation over the observables.

Edit getOne() method to fetch the particular conference Detail

Describe the bug
Currently getOne() function is fetching the dummy testing conference.

To Reproduce
Steps to reproduce the behavior:
write a test file and call getOne() method

Expected behavior
getOne() method should have returned a particular conference based on its id or any other PK

where to look?
Notifier-service/src/services/conference.ts

Guide 2 Research Scraper needs a redirect method to get pages

Is your feature request related to a problem? Please describe.
Due to the way Guide2Research works, self.get_page method in the Scrapper base class cannot be used. There has been a discussion about this in #43

Describe the solution you'd like
An additional method/modification to the current get_page method to allow for redirects

Describe alternatives you've considered
There's no alternative way that this problem could be handled.

[Improv.] add a configuration validator in server.ts

In the project section Notifier Service and Search-Service, the environment variables are loaded using .env files using .env package. It would be great to have a validator in server.ts file which can validate that the variables are all present or not.

REQUIREMENT

  • use node validator package for validating entries of environments are present or not
  • the code should reside in server.ts of Notifier-Service and Search-Service.

[Improv.] introduce Jest test framework for unit test in Notifier-Service

Currently, the Notifier service has the base routes, controller and respective services ready. What we need now is a unit test framework that will test the route + the services response. With good predictable test cases to catch bugs early. For this we need to have the unit test framework setup for the project.

REQUIREMENT

  • setup jest and package.json test commands properly
  • write basic unit test for now to run.

[Improve] Remove pkg-resources from requirements.txt

We need to remove pkg-resources==0.0.0 from requirements.txt file here.

When I tried to install the dependencies, I came across the following error.

ERROR: Could not find a version that satisfies the requirement pkg-resources==0.0.0 (from -r requirements.txt (line 7)) (from versions: none)
ERROR: No matching distribution found for pkg-resources==0.0.0 (from -r requirements.txt (line 7))

[Feat.] build ChekPoint system for the scrapper.

In case of a sudden failure and crash, we don't want the process to restart and start scrapping from the beggining.

** REQUIREMENT **
1 . Add method of Database class inside db.py to store check point information
2. Let user implementing plugin decide how to use the checkpointing system.

Note: Database class uses pymongo and contains no data that is modified after creation of object. So it's safe to say it is thread safe.

[Feat.] add getConference() query to Conference model in Notifier-Service

Currently, there is only one getOne dummy async function for retrieving one document. The controller for the Conference model will be requiring few queries which will be contained by the Conference model class.

REQUIREMENT
-> add the required methods by following the controller requirements

Where to look at
-> controllers/conference.ts
-> models/conference.ts

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.