GithubHelp home page GithubHelp logo

mochoabuilds / autonomous-retail-checkout Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 30 KB

computer vision app - normalize the idea of retailers providing digital currency for autonomous checkout directly to guests (2020-2021)

autonomous-retail-checkout's Introduction

geaux

1 WHY?

  • As the lines between banks, tech and retail blur we must ask how technology is changing the checkout experience. Retailers may need to prepare for a long-term shift in how money and checkout works. We must not forget that money has no weight, it just does real things in our world. Our code runs on the idea that "people don't want to make payments, they want to do what a payment facilitates". Our proof of concept automates the dull and repetitive tasks of scanning goods by having a camera, processor and many lines of code run autonomous checkout software. We believe shoppers want choice. Sometimes they will want autonomous-checkout and sometimes they will want to checkout with a cashier. This choice also gives workers some freedom from the register so they can focus on those things that make a retailer feel more like a "third place".
  • Our moonshot is to normalize the idea of retailers providing digital currency for autonomous checkout directly to guests. Imagine walking into a retailer, "checking in" with your smartphone, browsing/shopping and then walking out. This pre-pay approach means transactions could also be settled instantly on in-house digital ledgers, rather than the pipes of cards networks and banks. By pulling transactions out of the banking system we would also cut operating expenses for retailers, which amount to 1-3% per transaction.

2 RESEARCH GOALS

  • As with any AI and machine learning learning, the syllabus determines the outcome. Our AI's syllabus is focused on the political value of time at in-store checkout that we may have not considered ourselves. We build the system to think like an ethnographer, very sensitive to the members meaning filling each respective space. The subroutines are written to make sure the biases of the analog world regarding the political value of time are not repeated in the AI and machine learning world. Questions such as -- who benefits and who is harmed by this application? And does it put power into the hands of the already powerful? -- are asked repeatedly at every step in the software development process. This approach to development ensures diversity and sustainability is actively built in at every step of the system.
  • Additionally, this project was created to showcase my ability to write software that generates revenue at the graduate level.

3 HOW?

  • 1 - ASK "WHAT SYSTEM MUST DO" by analyzing problem with shoppers and experts in the field to determine what is required of solution
  • 2 - MODEL REQUIREMENT for OBJECTS using bottom-up approach that looks for objects that closely model the real world
  • 3 - NAME the CLASSES and METHODS that carry out these RESPONSIBILITIES from step 2
  • 4 - REFINE the CLASS INTERFACE and DETERMINE: CLASSNAME, RESPONSIBILITIES and COLLABORATOR CLASSES to build cohesion
  • 5 - DEVELOP the MESSAGE SENDING PROTOCOLS using step 4 as starting point and determine what types of arguments need to be sent with a message and what type of object a method can return. Here is where we also document preconditions and postconditions, that solidify are the responsibilities of the class.
  • 6 - SETUP the CLASS FRAMEWORKS for testing newly identified classes to make sure components integrate and fulfill our goals
  • 7 - TEST CLASS INTERFACE and "add some flesh" to step 6 by sending message sending protocol to class interface before developing the methods
  • 8 - FILL in the METHODS and CLASS ATTRIBUTES for each class to make sure object's methods work
  • 9 - TEST, DEBUG AND INTEGRATE all the parts :)

4 CORNER STORE v-1 DATASET

ACTIONS / OBJECTS TRACKED

  • Items in Store (~80)
  • Single Handed Pulls of Items From Shelf
  • Double Handed Pulls of Items From Shelf
  • Hand Offs of Items Between People
  • Non Shelf Pulls of Items
  • Tosses of Items Between People
  • Misplaced Items
  • Final Item Inventory and Checkout
  • This dataset is a growing list. It is mainly focused on scenarios that require long range temporal reasoning for differentiation. Other scenarios focus more on the object-action relationship, such as people pulling itmes from a shelf. This current dataset has 100 action-object classes, with 100 clips for each action and each clip running approx 10 seconds. In total the v1 dataset has 10,000 videos. It should be noted that all clips collected have variable resolution and frame rates.

4 "PRETRAINING"

  • We use an offshoot of ImageNet model to pre-train our system. We are immensely grateful for ImageNet and the human labor it took to sort, label and prep the millions of images across thousands of categories. Pre-training helps us avoid representational bottlenecks, boost our activations per tile and better balance the width and depth of the network. What's more this approach helps us better understand how to model spatio-temporal cues at different scales across multiple cameras.

5 ADDRESSING DATA BIAS

6 ANALYSIS, DESIGN & IMPLEMENTATION

7 TRAINING DETAILS

  • Learning Rates
  • Additional Data Prep
  • Testing

8 EXPERIMENTS

  • In this section we explore the accuracy and generalization of models on different benchmark data sets.
  • Accuracy = Repeatability + Calibration

9 RESULTS

10 MEASURING ACCURACY ACROSS BENCHMARK DATASETS

  • UCF-101 DATASET - 13320 videos across 101 action classes
  • Accuracy:
  • HMDB-51 DATASET - 6766 videos across 51 action classes
  • Accuracy:

11 DISCUSSION

APPENDIX A: "CORNER STORE v-1" EXPANDED

ACKNOWLEDGEMENTS

  • This work was supported by my immediate family, close friends, confidants, my neighborhood grocers/corner stores and the open source community.

SANDBOX

  • Teaching computer vision systems NOT to use guest biometrics as a condition of accepting payment under the Song Beverley Act of 1971
  • Determining the metrics that improve downstream performance
  • Better understanding the front-back orientation of human limbs due to clothing, lighting, background, optical interference
  • "What You Almost Bought" Merchant Facing Application
  • Bridging the edge-cloud barrier to lower server costs?
  • Rewriting code and making arrays more ergonomic
  • Lowering the costs of switching data across different corner stores
  • Better understanding how social, economic and legal systems work together to achieve goals

REFERENCES

  • Writing Ethnographic Field Notes, 2nd Edition (2011)
  • The Great Good Place by Roy Oldenburg (1989)
  • The Poltical Value of Time by Elizabeth F. Cohen (2018)
  • Building Business Applications Using C++
  • Hands On Machine Learning with C++ (2020)
  • Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library (2017)
  • "Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset" (2018)
  • "The Kinetics Human Action Video Dataset" (2017)
  • "Two Stream Convolutional Networks for Action Recognition in Videos"
  • "Towards Good Practices for Very Deep Two-Stream ConvNets" (2015)
  • "FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks" (2016)
  • "Going Deeper with Convolutions" (2014)
  • "Rethinking the Inception Architecture for Computer Vision" (2015)

autonomous-retail-checkout's People

Contributors

mochoabuilds avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.