Two CNN's were paried in tandem; one CNN inferenced an audio clipping while the other CNN inferenced a series of video frames; the emotion of each dataset was assessed, and the overall emotion was predicted. Based on this emotion, an overall feeling of the environment was gauged, by which an appropriate song could then be played to match emotion of environemnt.
I contributed all files in the folder other than the 'analysis.py' file, which was made by a teammate.