README FILE
Author: Jianyuan (Jet) Yu
Affiliation: Wireless, ECE, Virginia Tech
Email : [email protected]
Date : April, 2018

Overview
News
ToDoList
- Capacity_Robust
- PolicyGradient
- POMDP
- Stochastic
- ScaleUp
- Doxy
Notice
Bugs
Related Files:
Reference:
Tutorial of Deep Reinforcement Learning
Ongoing Work - POMDP
Configuration
File Topology
How to run

Overview

This project work around applying deep Q network[1] in dynamic channel access. It validate the performance of intelligent node acess channel without information exchange with other nodes(legacy, hopping, intermittent, dsa etc). It mainly concerns about convergency speed and scale issues.
To be exact, we look into following aspects:

coexsitence with other type of nodes
- legacy
- legacy with tx prob
- hopping
- intermittent(duty cycle)
- dsa (able to wait)
- poission (the arrival interval & service interval follow poisson distribution, i.e. M/M/1 queue model)
- mdp
- dqn
learn to wait
learn to occupy more than one channels
learn to avoid hidden nodes
learn to utilize spatial reuse (exposed nodes)
select good channels (when several channel available, some low quality channel bring low reward).

The inspiration comes from SC2 competition, and some papers[2][3] have start some work around it.

The project transfer Chris's code of MDP-DCA Matlab simulator as the starter with MDP python solver[4], and then adapot DQN python solver[5].
Another repository[6] maintain by Yue would merge soon, and [7] is the technical report.

News

(Fri Aug 3) Multiple learning nodes coexsit fixed, starting running scale-up case.
(Tue Jul 17) 2-state markov Chain node added
(Fri Jun 29) stack-DQN
- add partial observation node with shorten observation as state
- add partial observation node with shorten observation plus padding with zero/one as full state
- add partial observation node with stacked partial observation together as state
(Sun Jun 24) some new features
- add in possion node, model under M/M/1 theory, with arrival rate & service rate configurable
- add in policy gradient learning node, namely deep policy gradient (dpg) node
- rename, adapt string name, dumb node under 9, learning node start with 10 or more

ToDoList

Capacity_Robust

learn possion node
learn legacy node with fixed baised tx prob
learn long im node under limited memory and steps
dynamic environment
learn to greedy occupied all available channel
~~efficient multiple dsa node coexist~~
~~multiple dqn node coexist~~
merge yue's guess item & eligiable trace dqn node

PolicyGradient

POMDP

vi
pomcp

Stochastic

possion
uniform
2-state markovChain

ScaleUp

Doxy

Notice

When assign new number of channels and DQN node exist, need to restart the IPython console, exist pop size umatch error. While would not happen in raw terminal.
mdpNode would meet compuation constraint when number of channel over 10, result in dead loop (stuck at stateSpaceCreate).
the assigment of patrial observation is currently in dqnNode.py file, tho silly way.
due to the .pyc are git-ignored, when do the pull, these files would not be cloned, hence when run codes without .pyc, it would pop error like ImportError: No module named ddpgNode. Never mind, just execute again and again (to generate more .pyc files) till it runs through.

Bugs

~~We assume all nodes detect and make decision at same time, hence the multiple dsaNode may collide (T.B.D.)~~. -> create politeness to dsa nodes to avoid ping-pong effect, a ugly way.
~~Unstable performance when multiple dqnNode works (T.B.D.)~~. -> assign priority to learning nodes to make them observe-action one by one, a ugly way.

yujianyuanhaha / dqn-dsa Goto Github PK

dqn-dsa's Introduction

Table of Contents

Overview

News

ToDoList

Capacity_Robust

PolicyGradient

POMDP

Stochastic

ScaleUp

Doxy

Notice

Bugs

dqn-dsa's People

Contributors

Stargazers

Watchers

Forkers

dqn-dsa's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs