GithubHelp home page GithubHelp logo

cdl-platform's Introduction

TextData

TextData is a social platform for collaboratively creating, sharing, and learning from wiki-style communities. We offer a stand-alone website and a Chrome extension, all for free.

To use TextData, you have three options:

  1. Full online version: Visit textdata.org, install the Chrome extension, create an account, and begin using TextData.
  2. Full offline version: Clone this repository, set up Docker, and run the services locally. This is described in the section below titled "Setting Up the Local Version".
  3. Hosted backend, local frontend: You can leverage the APIs for the backend, and create or extend your own frontend (website or browser extension). The API documentation is here.
Setting Up the Offline Version

Setting Up the Offline Version

Note that the local version is still under development:

  • No data is persisted; once the Docker containers stops, all data is lost.
  • "Reset Password" will not work due to no access to SendGrid.

Requirements for running locally

open powershell
wsl -d docker-desktop
sysctl -w vm.max_map_count=262144
  • With all of the Docker containers, packages, and models, the total size is ~10GB. Without Neural, it is ~3GB.

Configuring the env files

Copy the following to backend\env_local.ini:

api_url=http://localhost
api_port=8080
webpage_port=8080
jwt_secret=0047fa567bbddf121b23d1deaa7ff2af
redis_host=redis
redis_port=6379
redis_password=admin
cdl_uri=mongodb://mongodb:27017/?retryWrites=true&w=majority
cdl_test_uri=mongodb://localhost:27017
db_name=cdl-local
elastic_username=admin
elastic_password=admin
elastic_index_name=submissions
elastic_webpages_index_name=webpages
elastic_domain=http://host.docker.internal:9200/
elastic_domain_backfill=http://localhost:9200/

Copy the following to frontend\website\.env.local":

NEXT_PUBLIC_FROM_SERVER=http://host.docker.internal:8080/
NEXT_PUBLIC_FROM_CLIENT=http://localhost:8080/

Copy the following to frontend\extension\.env.local ":

REACT_APP_URL=http://localhost:8080/
REACT_APP_WEBSITE=http://localhost:8080/

Starting the services

Website, backend API, MongoDB, and OpenSearch:

Add the following to docker-compose.yml:

services:
    redis:
        image: redis:alpine
        command: redis-server --requirepass admin
        restart: always
        ports:
            - '6379:6379'

    reverseproxy:
        image: reverseproxy
        build:
            context: ./reverseproxy
            dockerfile: Dockerfile-local
        ports:
            - 8080:8080
        restart: always

    mongodb:
        image: mongo
        ports:
            - 27017:27017
        restart: always
        command: mongod --bind_ip 0.0.0.0

    opensearch-node1:
        image: opensearchproject/opensearch:latest
        container_name: opensearch-node1
        environment:
            - cluster.name=opensearch-cluster
            - node.name=opensearch-node1
            - discovery.seed_hosts=opensearch-node1
            - cluster.initial_cluster_manager_nodes=opensearch-node1
            - bootstrap.memory_lock=true
            - DISABLE_SECURITY_PLUGIN=true
            - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
        ulimits:
            memlock:
                soft: -1
                hard: -1
            nofile:
                soft: 65536 # Maximum number of open files for the opensearch user - set to at least 65536
                hard: 65536
        ports:
        - 9200:9200 # REST API

    website:
        depends_on:
            - reverseproxy
        image: website
        build: ./frontend/website
        env_file: ./frontend/website/.env.local
        restart: always
        extra_hosts:
            - "host.docker.internal:host-gateway"

    api:
        depends_on:
            - reverseproxy
            - redis
            - mongodb
            - opensearch-node1
        image: api
        build: ./backend
        restart: always
        env_file: ./backend/env_local.ini

Run the docker-compose file: docker-compose -f docker-compose.yml up -d --build

To stop: docker-compose -f docker-compose.yml down

If you would like to add the neural options (reranking, generation), add the following to backend\env_local.ini and restart docker-compose:

neural_api=http://host.docker.internal:9300/

Note that the slashes need to be reversed if running on Mac/Linux (above is written for windows).

Then, navigate to the neural folder, add the following to env_neural_prod.ini

hf_token=<your huggingface token>

Finally, to start the neural docker container (requires GPU), run the following from the neural folder:

docker build -t .
docker run --gpus --env-file env_neural_prod.ini -p 9300:80 hash_of_above_image

Extension:

Navigate to frontend\extension and run npm ci and then run npm run build. Then upload the build file to Chrome while using Development Mode. Once uploaded, open the extention, go to the setting section and chnage the backend source from textdata.org to other and click on Save.

Running Test cases

Note: Local Docker containers must be up and running before you run below commands

cd <project-directory>\backend
pytest .\tests\test_server.py

Running the Back-Fill script

Note: Local Docker containers must be up and running before you run below commands

cd <project-directory>\backend
python .\app\helpers\backfill.py [--env_path] [--type=<"submissions" or "webpages">]

Here, --env_path is an optional argument that takes the path to the environment file, and the default file considered is backend\env_local.ini.

The --type is another optional argument that takes two values: submissions or webpages, and the default value is submissions.

Building on Top of the Hosted Backend

Building on Top of the Hosted Backend

See the API documentation here. Please be courteous regarding the amount of API calls so that the backend servers do not get overwhelmed.

Contributors

How can I contribute?
For any single bug fix or small feature: fork this repository, make a pull request, and describe the change in the request.

For a longer-term collaboration, big feature, or large change, please send an email to [email protected].

  • Kevin Ros is a 4th year Ph.D. student at the University of Illinois Urbana Champaign. This is the main component of his thesis project. He built the initial version of the platform and has led its development since its beginning (September 2022).
  • ChengXiang Zhai is Kevin's advisor, and he has played a crucial role in shaping the vision. Moreover, he has provided the grant funding to support the infrastructure, the research, and the development.
  • Current contributors: Rakshana Jayaprakash, Dhyey Pandya, Kedar Takwane, and Sharath Chandra.
  • Past contributors: Ashwin Patil, Alvin Zhang, Nikhitha Reddeddy, Heth Gala

Publications

  • Kevin Ros and ChengXiang Zhai. 2023. The CDL: An Online Platform for Creating Community-based Digital Libraries. In Computer Supported Cooperative Work and Social Computing (CSCW ’23 Companion), October 14–18, 2023, Minneapolis, MN, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3584931.3607495
  • Kevin Ros, Matthew Jin, Jacob Levine, and ChengXiang Zhai. 2023. Retrieving Webpages Using Online Discussions. In Proceedings of the 2023 ACM SIGIR International Conference on the Theory of Information Retrieval (ICTIR ’23), July 23, 2023, Taipei, Taiwan. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3578337.3605139

cdl-platform's People

Contributors

kevinros avatar rakshanaj17 avatar kedartakwane avatar amrutha00 avatar theashwin avatar choprahetarth avatar dhp98 avatar ojasviagarwal avatar afesuazo avatar

Stargazers

 avatar SHANKAR KUMAR S avatar shangrex avatar  avatar Mihir Pamnani avatar Zihan Yan avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

cdl-platform's Issues

Batch upload to support not including optional source URL

The website supports submissions without a source URL (e.g., a note by itself, not about a website). But the batch-upload feature still checks to makes sure a URL is present, and will fail otherwise.

Instead, the batch upload should allow for an empty source URL field on submissions.

Add markdown editing abilities in extension

Currently, the extension submission process differs from the website submission process - the extension does not support markdown editing / formatting on submission.

To fix: update the extension to be able to support markdown editing when creating a submission.

Visualization returns application error

Haven't been able to reproduce yet, but clicking around in the search result visualization for a bit sometimes leads to an application error.

Will try to reproduce and add more. Happens on dev and in prod.

Update icons to be more meaningful + add hover text

  1. The logout and leave community are the same symbol but different actions. Can change the leave community icon to something else.
  2. On a submission page, can change the reply and feedback feature icons to make them clearer.
  3. Rename "Feedback" to "Report Issue".
  4. Rename "Share URL" to "Copy URL"

Also, can add hover text for icons. For example,

  • Reply --> "Create a submission in reply to this submission"
  • Feedback --> "Send a message to the TextData development team about an issue with this submission".

Add caching in extension

When a user opens the extension and either (1) views/searches for results, or (2) starts to edit a submission, all is lost of the extension is closed. Not only does this make for a poor user experience, but it also adds unnecessary API calls when a user clicks a search result (opens a new page, then the extension closes).

To fix this:

  1. Add a submission cache in the browser where the draft of the submission is saved in browser
  2. Add a result cache in the browser where the most recent search result page is saved in browser

In both cases, can save according to the URL of the webpage as the key of the cache.

Reorganize submission page

The website's submission page should probably use empty space more efficiently by

  1. Minimize the buttons (edit, delete, share, etc.)
  2. Make the submission note content larger (examples from Confluence, notion)
  3. Provide a toggle to show the graph, replies, etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.