anna-liepina / explore-cwa-react Goto Github PK

License: MIT License

Dockerfile 0.78% Makefile 1.73% JavaScript 16.10% HTML 0.20% SCSS 7.33% TypeScript 73.86%

big-data docker education geospatial graphql learn-to-code makefile non-profit reactjs typescript

explore-cwa-react's Introduction

	master	heroku
tests
coverage

'Explore Me CWA' [client web application]

This project is centered around parsing various datasets, including UK government data on property sales, police reporting data, and post code data. The goal is to harness geographical information to establish connections between postcodes using latitude and longitude.

The primary objective is to develop a scalable GraphQL backend capable of swiftly delivering requested results. This endeavor seeks to illuminate intricate aspects of GraphQL use, addressing challenges like the N+1 problem and scaling scenarios where more than one database is required for both write and read nodes.

Key features of the project include a robust automated Quality Assurance (QA) system, incorporating anonymized data seeding for comprehensive QA testing. The project also explores the flexibility of JavaScript, pushing the boundaries of the language. Notably, it delves into the constraints of default V8 object fields, which are capped at around ~8.4 million, while highlighting the superior handling capacity of the Map data structure.

Additionally, the project incorporates a queue system to enhance the efficiency of data processing. In essence, project serves as a practical demonstration of diverse and advanced aspects of software development, reflecting a commitment to excellence and innovation.

Live DEMO

Web Application [https://exploreme.co.uk]
- GraphQL [https://graphql.exploreme.co.uk]

software requirements

if you're using make commands, docker and docker-compose are required, and local node.js with npm are optional

node.js
npm or yarn
optional makefile comes out of the box in unix enviroments
optional docker v18.09+
optional docker-compose v3+ for 'cypress' tests only

used technologies

used services

how to install

with make commands no additional steps are required, otherwise you need to execute $ npm i

how to run tests

end-to-end 'cypress' tests: $ make sync to fetch GraphQL backend as a git submodule, then $ make cypress
- npm analogue require booting up CWA & SA and link them together, then cd cypress && npm test
functional 'jest' tests: $ make test or $ npm test
- optional 'jest' CLI params, examples:
  - to collect coverage, example: $ npm test -- --coverage, report will be located in ./coverage directory
  - to run tests only in specific file, example: $ npm test src/validation/rules.test.js

how to run in 'development' mode

$ make or $ npm start

how to run in 'production' mode

$ make serve, there is no npm equivalent
if you only need to generate static assets
- $ make build or $ npm run build - generated assets will be located in ./build directory

how to run containers with different variables using 'make'

example: make PORT=18080

gitflow

heroku -> current production, contains production specific changes, trigger deploy on AWS on every push
master -> most upto date production ready, all pull requests into this branch got mandatory checks 'ci/circleci: jest' and 'ci/circleci: cypress'
feature branches -> get merged into the master branch when they are ready and mandatory checks passed
CI executes tests in an isolated environment

used environment variables

variable	default value	used as	purpose
PORT	8080	number	port on which the application will be made available
REACT_APP_GRAPHQL	//localhost:8081	string	GraphQL backend URI
REACT_APP_TITLE	DATA EXPLORER	string	website's title

DEMO

overview

properties

incidents

explore-cwa-react's People

Contributors

Stargazers

Watchers

explore-cwa-react's Issues

implement Markers API

Purpose:
The implementation of the Markers API aims to enhance data retrieval efficiency by avoiding full scans, reducing unnecessary data fetches, and enabling the effective utilization of NoSQL databases.

Benefits:
Reduced Information Fetching:
By utilizing the Markers API, we can avoid fetching extra information, leading to a more streamlined and efficient data retrieval process.
NoSQL Compatibility:
Leveraging the Markers API allows us to utilize NoSQL databases more effectively. We can use Marker ID or latitude/longitude information to fetch data within the 16 MB limit in MongoDB.

Technical Implementation:

Marker ID or Lat/Lng Usage:
Utilize the unique Marker ID or latitude/longitude information as key parameters in the API to retrieve specific data points, eliminating the need for extensive scans.
Integration with NoSQL Database:
Ensure seamless integration with NoSQL databases, such as MongoDB, to fully leverage their capabilities for optimized data storage and retrieval.

Context:
To address the current challenges related to full scans and to prepare our system for the future scale, the introduction of the Markers API is crucial. This approach aligns with best practices for efficient data handling and complements our transition to NoSQL databases.

Let's discuss and plan the implementation details to ensure a smooth integration that maximizes the benefits of the Markers API.

Your insights and feedback on this proposal are highly valuable.

Implement github actions

replicate CI checks to GitHub actions

it will help us reach a smooth experience, as well as have better integrations with github

reduce storage requirments for Data

My investigation into database memory usage revealed that approximately 20% of the fields are allocated to unused .id fields. Subsequent confirmation indicated that these fields are not actively utilized within the application. To optimize memory consumption, we propose dropping these unused .id fields, aiming for a significant reduction in memory usage. The implementation steps include backing up the database, verifying the absence of .id field usage in the code, modifying the schema to remove these fields, updating foreign key references if necessary, conducting thorough testing, and finally updating documentation. We invite feedback and insights to ensure a collaborative and successful optimization effort.

reduce Production Cost bill

Hi Team,

As we gear up for rapid development, it's crucial to consider the impact on our CI costs, especially with each merge triggering deployment. Given our current funding constraints and budget considerations, it might be prudent to reassess our deployment strategy.

Proposed Solution:
Instead of triggering deployments with every merge, let's explore the option of deploying once a day or even less frequently. This approach can help optimize our CI costs and ensure more efficient resource utilization.

Considerations:
Budget Constraints: Given our financial constraints, deploying less frequently can significantly mitigate the impact on CI costs.
Heroku Branch Repurposing: We could repurpose the existing Heroku branch for this purpose. Alternatively, we might want to consider creating a new branch specifically for less frequent deployments.
Balancing Speed and Cost: While rapid development is a priority, finding the right balance between speed and cost-effectiveness is crucial. This adjustment allows us to maintain a steady development pace without compromising our budget.

Let's discuss this proposal further in our next meeting and decide on the most suitable approach for our current situation.

Your feedback on this is highly valuable.

SPIKE: data federation

Objective:
Explore data federation solutions to host up to 1 terabyte of data, considering cost-effectiveness and the current absence of funding for AWS/Azure/GCP.

Context:
Hosting data on AWS/Azure/GCP is currently deemed too expensive, given our current funding limitations. However, we have the opportunity to host approximately 200 GB for free through data federation. While this presents a cost-effective alternative, it comes with its set of challenges.

Challenges and Considerations:
Data Federation Limitations:

To cover UK territory we need, Theoretical Limit of 1 Terabyte:
We aim to explore data federation solutions that can accommodate up to 1 terabyte of data, understanding the theoretical limit.
Challenges in Managing Larger Datasets:
Hosting larger datasets through data federation might pose challenges in terms of performance, scalability, and maintenance.

Cost-Effectiveness:
Exploring Free Hosting Options:
Given our current financial constraints, the focus is on solutions that align with a limited or no-cost hosting model.
Open Ticket for Suggestions:

This ticket serves as an open call for suggestions from the team on viable, cost-effective data federation solutions.

Proposed Discussion Points:
Data Federation Platforms:
Identify data federation platforms that offer cost-effective hosting solutions.
Explore the scalability and performance of these platforms, especially when dealing with datasets nearing the 1 terabyte limit.
Community or Open-Source Solutions:

Investigate community-driven or open-source solutions that might provide a cost-effective hosting option for larger datasets.
Consider the feasibility and support available for implementing such solutions.
Best Practices and Recommendations:

Gather insights and best practices from team members or external sources who have experience with cost-effective data federation solutions.
Explore success stories and potential pitfalls to make informed decisions.

Next Steps:
Let's collaborate on this open ticket, share insights, and collectively explore viable options for hosting our data federation up to the theoretical limit of 1 terabyte. Your suggestions and expertise in this matter are highly valuable.

IMPORTANT:
Discuss and strategize for potential vendor lock-in scenarios if applying for startup funding from GCP/AWS/Azure. Establish mechanisms for efficient data set merging/splitting in case of vendor lock-in.

increase coverage to at least 80%

The goal of this task is to improve the code coverage of our project to at least 80% following the 80/20 rule. Code coverage helps ensure the reliability and maintainability of our codebase.

Implement ESLint rules to enhance code quality and ensure scalability

Task Details:
Default ESLint Rules:
Ensure that the default ESLint rules are correctly applied and meet our expected coding standards.

Consider Implementing unicorn Plugin:
Apart from the default presets, consider integrating the unicorn plugin to enhance code quality further. This can help eliminate instances of using .filter()[0] and encourage the use of more efficient methods such as .find.

Code Optimization for Scalability:
Given the anticipated growth of our data to double-digit terabytes, it's crucial to optimize the code to prevent any potential full scans. Addressing this from the beginning is more efficient than applying quick fixes later on.

Avoiding .filter()[0]:
Replace occurrences of .filter()[0] with the more efficient .find method to prevent unnecessary array processing.
Context:
As discussed, the scale of our data is expected to reach double-digit terabytes. It is paramount to proactively avoid any full scans to ensure optimal performance. Taking these measures early on is a strategic approach, avoiding the need for "duck-tape" solutions later.

Let's collaborate on implementing these enhancements to maintain code quality, adhere to best practices, and ensure our system is well-prepared for future scalability challenges.

Your input on this is highly appreciated.