GithubHelp home page GithubHelp logo

anna-liepina / explore-cwa-react Goto Github PK

View Code? Open in Web Editor NEW
2.0 2.0 0.0 12.52 MB

License: MIT License

Dockerfile 0.78% Makefile 1.73% JavaScript 16.10% HTML 0.20% SCSS 7.33% TypeScript 73.86%
big-data docker education geospatial graphql learn-to-code makefile non-profit reactjs typescript

explore-cwa-react's Introduction

master heroku
tests tests tests
coverage coverage coverage

'Explore Me CWA' [client web application]

This project is centered around parsing various datasets, including UK government data on property sales, police reporting data, and post code data. The goal is to harness geographical information to establish connections between postcodes using latitude and longitude.

The primary objective is to develop a scalable GraphQL backend capable of swiftly delivering requested results. This endeavor seeks to illuminate intricate aspects of GraphQL use, addressing challenges like the N+1 problem and scaling scenarios where more than one database is required for both write and read nodes.

Key features of the project include a robust automated Quality Assurance (QA) system, incorporating anonymized data seeding for comprehensive QA testing. The project also explores the flexibility of JavaScript, pushing the boundaries of the language. Notably, it delves into the constraints of default V8 object fields, which are capped at around ~8.4 million, while highlighting the superior handling capacity of the Map data structure.

Additionally, the project incorporates a queue system to enhance the efficiency of data processing. In essence, project serves as a practical demonstration of diverse and advanced aspects of software development, reflecting a commitment to excellence and innovation.

Live DEMO

software requirements

if you're using make commands, docker and docker-compose are required, and local node.js with npm are optional

used technologies

used services

how to install

  • with make commands no additional steps are required, otherwise you need to execute $ npm i

how to run tests

  • end-to-end 'cypress' tests: $ make sync to fetch GraphQL backend as a git submodule, then $ make cypress
    • npm analogue require booting up CWA & SA and link them together, then cd cypress && npm test
  • functional 'jest' tests: $ make test or $ npm test
    • optional 'jest' CLI params, examples:
      • to collect coverage, example: $ npm test -- --coverage, report will be located in ./coverage directory
      • to run tests only in specific file, example: $ npm test src/validation/rules.test.js

how to run in 'development' mode

  • $ make or $ npm start

how to run in 'production' mode

  • $ make serve, there is no npm equivalent
  • if you only need to generate static assets
    • $ make build or $ npm run build - generated assets will be located in ./build directory

how to run containers with different variables using 'make'

  • example: make PORT=18080

gitflow

  • heroku -> current production, contains production specific changes, trigger deploy on AWS on every push
  • master -> most upto date production ready, all pull requests into this branch got mandatory checks 'ci/circleci: jest' and 'ci/circleci: cypress'
  • feature branches -> get merged into the master branch when they are ready and mandatory checks passed
  • CI executes tests in an isolated environment

used environment variables

variable default value used as purpose
PORT 8080 number port on which the application will be made available
REACT_APP_GRAPHQL //localhost:8081 string GraphQL backend URI
REACT_APP_TITLE DATA EXPLORER string website's title

DEMO

overview

Area Overview

properties

Properties

incidents

Incidents

explore-cwa-react's People

Contributors

anna-liepina avatar eugene-matvejev avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

explore-cwa-react's Issues

implement Markers API

Purpose:
The implementation of the Markers API aims to enhance data retrieval efficiency by avoiding full scans, reducing unnecessary data fetches, and enabling the effective utilization of NoSQL databases.

Benefits:
Reduced Information Fetching:
By utilizing the Markers API, we can avoid fetching extra information, leading to a more streamlined and efficient data retrieval process.
NoSQL Compatibility:
Leveraging the Markers API allows us to utilize NoSQL databases more effectively. We can use Marker ID or latitude/longitude information to fetch data within the 16 MB limit in MongoDB.

Technical Implementation:

Marker ID or Lat/Lng Usage:
Utilize the unique Marker ID or latitude/longitude information as key parameters in the API to retrieve specific data points, eliminating the need for extensive scans.
Integration with NoSQL Database:
Ensure seamless integration with NoSQL databases, such as MongoDB, to fully leverage their capabilities for optimized data storage and retrieval.

Context:
To address the current challenges related to full scans and to prepare our system for the future scale, the introduction of the Markers API is crucial. This approach aligns with best practices for efficient data handling and complements our transition to NoSQL databases.

Let's discuss and plan the implementation details to ensure a smooth integration that maximizes the benefits of the Markers API.

Your insights and feedback on this proposal are highly valuable.

Implement github actions

replicate CI checks to GitHub actions

it will help us reach a smooth experience, as well as have better integrations with github

reduce storage requirments for Data

My investigation into database memory usage revealed that approximately 20% of the fields are allocated to unused .id fields. Subsequent confirmation indicated that these fields are not actively utilized within the application. To optimize memory consumption, we propose dropping these unused .id fields, aiming for a significant reduction in memory usage. The implementation steps include backing up the database, verifying the absence of .id field usage in the code, modifying the schema to remove these fields, updating foreign key references if necessary, conducting thorough testing, and finally updating documentation. We invite feedback and insights to ensure a collaborative and successful optimization effort.

reduce Production Cost bill

Hi Team,

As we gear up for rapid development, it's crucial to consider the impact on our CI costs, especially with each merge triggering deployment. Given our current funding constraints and budget considerations, it might be prudent to reassess our deployment strategy.

Proposed Solution:
Instead of triggering deployments with every merge, let's explore the option of deploying once a day or even less frequently. This approach can help optimize our CI costs and ensure more efficient resource utilization.

Considerations:
Budget Constraints: Given our financial constraints, deploying less frequently can significantly mitigate the impact on CI costs.
Heroku Branch Repurposing: We could repurpose the existing Heroku branch for this purpose. Alternatively, we might want to consider creating a new branch specifically for less frequent deployments.
Balancing Speed and Cost: While rapid development is a priority, finding the right balance between speed and cost-effectiveness is crucial. This adjustment allows us to maintain a steady development pace without compromising our budget.

Let's discuss this proposal further in our next meeting and decide on the most suitable approach for our current situation.

Your feedback on this is highly valuable.

SPIKE: data federation

Objective:
Explore data federation solutions to host up to 1 terabyte of data, considering cost-effectiveness and the current absence of funding for AWS/Azure/GCP.

Context:
Hosting data on AWS/Azure/GCP is currently deemed too expensive, given our current funding limitations. However, we have the opportunity to host approximately 200 GB for free through data federation. While this presents a cost-effective alternative, it comes with its set of challenges.

Challenges and Considerations:
Data Federation Limitations:

To cover UK territory we need, Theoretical Limit of 1 Terabyte:
We aim to explore data federation solutions that can accommodate up to 1 terabyte of data, understanding the theoretical limit.
Challenges in Managing Larger Datasets:
Hosting larger datasets through data federation might pose challenges in terms of performance, scalability, and maintenance.

Cost-Effectiveness:
Exploring Free Hosting Options:
Given our current financial constraints, the focus is on solutions that align with a limited or no-cost hosting model.
Open Ticket for Suggestions:

This ticket serves as an open call for suggestions from the team on viable, cost-effective data federation solutions.

Proposed Discussion Points:
Data Federation Platforms:
Identify data federation platforms that offer cost-effective hosting solutions.
Explore the scalability and performance of these platforms, especially when dealing with datasets nearing the 1 terabyte limit.
Community or Open-Source Solutions:

Investigate community-driven or open-source solutions that might provide a cost-effective hosting option for larger datasets.
Consider the feasibility and support available for implementing such solutions.
Best Practices and Recommendations:

Gather insights and best practices from team members or external sources who have experience with cost-effective data federation solutions.
Explore success stories and potential pitfalls to make informed decisions.

Next Steps:
Let's collaborate on this open ticket, share insights, and collectively explore viable options for hosting our data federation up to the theoretical limit of 1 terabyte. Your suggestions and expertise in this matter are highly valuable.

IMPORTANT:
Discuss and strategize for potential vendor lock-in scenarios if applying for startup funding from GCP/AWS/Azure. Establish mechanisms for efficient data set merging/splitting in case of vendor lock-in.

increase coverage to at least 80%

The goal of this task is to improve the code coverage of our project to at least 80% following the 80/20 rule. Code coverage helps ensure the reliability and maintainability of our codebase.

Implement ESLint rules to enhance code quality and ensure scalability

Task Details:
Default ESLint Rules:
Ensure that the default ESLint rules are correctly applied and meet our expected coding standards.

Consider Implementing unicorn Plugin:
Apart from the default presets, consider integrating the unicorn plugin to enhance code quality further. This can help eliminate instances of using .filter()[0] and encourage the use of more efficient methods such as .find.

Code Optimization for Scalability:
Given the anticipated growth of our data to double-digit terabytes, it's crucial to optimize the code to prevent any potential full scans. Addressing this from the beginning is more efficient than applying quick fixes later on.

Avoiding .filter()[0]:
Replace occurrences of .filter()[0] with the more efficient .find method to prevent unnecessary array processing.
Context:
As discussed, the scale of our data is expected to reach double-digit terabytes. It is paramount to proactively avoid any full scans to ensure optimal performance. Taking these measures early on is a strategic approach, avoiding the need for "duck-tape" solutions later.

Let's collaborate on implementing these enhancements to maintain code quality, adhere to best practices, and ensure our system is well-prepared for future scalability challenges.

Your input on this is highly appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.