GithubHelp home page GithubHelp logo

intercom's Introduction

InterCom

InterCom is a low-latency full-duplex intercom(municator) designed for the transmission of media (at this moment, only audio) between networked users. It is implemented in Python and designed as a set of layers that provide an incremental functionality, following a multilevel (one-to-one) inheritance model:

  1. minimal: records/plays raw (CD quality) audio, and sends/receives the chunks of audio to/from another intercom instance.
  2. buffer: delays the playing of chunks to hide the network jitter.
  3. DEFLATE*: uses DEFLATE to compress the chunks.
  4. BR_control*: uses quantization to control the transmission bit-rate.
  5. stereo_MST_coding*: removes spatial (inter-channel) redundancy.
  6. temporal_coding*: removes temporal (intra-channel) redundancy.
  7. basic_ToH: removes phycho-acoustic redundancy generated by the expected threshold of hearing.

intercom's People

Contributors

alcoiz avatar alexcraviotto avatar antoniojesu avatar arocalo avatar avt276 avatar cobeguel avatar cristiandc27 avatar gervillaesphoto avatar gonzabm avatar hamzaelfallah avatar jarh57 avatar jesuscazorla avatar jmmateo14 avatar juanrdzbaeza avatar laroga avatar meoko97 avatar miguel07alm avatar mohahnina avatar panteleevnikita avatar pcoloc avatar raquelgg avatar rtyui5 avatar smvg avatar vicente-gonzalez-ruiz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

intercom's Issues

5. Provide data-flow control

Packets can be lost because of the link between two (or more) interlocutors is congested. A simple algorithm to reduce the congestion is to send to the interlocutor the number of packets that arrived on time.

As can be seen in line 2.1 of the Readme, the player understands the media and can control the flow received from the receiver. Besides, the sender process also is (data-flow) controlled. This means that the buffer of quality layers should keep the fullness level over time.

The size of the buffer is configured by the user. So given a buffer size, the number those packets that cannot be buffered (because of never arrived or arrived late) can be considered to request to the interlocutor a smaller number of quality layers.

Working with a variable number of levels of the DWT

issue#23 has splitted the wavelet coefficients (represented as integers of 32 bits) into 32 bit-planes. The number of levels of the DWT is fixed to 5.

In this issue, modify the code of issue#23 to give to the user the option of selecting the number of levels. Introduce this value from the command-line. A good Python package to handle command-line arguments is https://docs.python.org/3/library/argparse.html

As a straightforward of this issue is to use also argparse to introduce the rest of the arguments to the intercom, such as the chunk size, the sampling rate and the number of channels.

Corregir el bug de la seleccion del puerto

Los threads esperan recibir una tupla como argumento y lo que ocurre al pasarles el puerto con mas de un digito como cadena es que se discomponen en una tupla por digitos. Ej "4004" -> pasa a ser ["4", "0","0","4"]
Revisar como ocurre exactamente e investigar las posibles soluciones.

Capture the packets generated by intercom

Issue #26 implements that for each chunk of audio the emitter intercom sends 32 UDP packets (one for each bit-plane). Using Wireshark, capture the packets generated by a chunk of audio (after performing DWT) and analyze them.

2. Data-flow control and priorization

Network congestion (and sometimes transmission errors) can generate a loss of the packets sent by intercom. The contribution of the content of the 32 packets/chunk sent by intercom is not the same (the most significant bit-planes are more important than the least significant). Define a data-flow algorithm for controlling that, if packet loss appears:

(1) The most important packets are transmitted before the least important.
(2) Network congestion is reduced, and if possible, eliminated.

1. Remove binaural redundancy

Left and right channels are quite similar (sometimes, identical). Encode the right channel as the sample-by-sample difference between the left and the right channel. In other words, compute

R = R - L

and in the receiver intercom, restore the original right channel with:

R = R + L

3. Minimize the reconstruction error

The Human Auditory System (HAS) is more sensitive to the low amplitude sounds (loud audio signals do carry less auditory information than quiet signals).

Audio signals are represented with positive and negative samples. Negative samples use 2's complement binary representation. Thus, for example:

-1|10 = 1111 1111 1111 1111|2

(-1 in decimal is represented by 16 ones in the 2's complement representation)

The partial reconstruction of positive samples when we suppose that the missing (not received) bit-planes are 0 is right. If the original sample is small and the less significant bit-planes are not received, the reconstruction is correct. However, if the transmitted sample is negative, and we suppose that the not received bit-planes are 0, we will generate a big reconstruction error. For example, if we transmit the 2 more significant bit-planes of -1|10, we will reconstruct the number:

1100 0000 0000 0000|2 = -16384|10

This problem can be addressed using different techniques. One is to work with the sign-magnitude representation of the samples. Thus, the sample -1|10 should be represented by:

-1|10 = 1000 0000 0000 0001|2

and if this sample is partially transmitted (using only 2 bit-planes), we would obtain:

1000 0000 0000 0000|2 = -0|10

which generates a small reconstruction error.

Another possibility is to suppose that the unknown bit-planes of the negative samples are all 1, when we know that the sample is negative. Thus, if we receive only the most significant bit-planes of the sample -1|10, we get:

1111 1111 1111 1111|2 = -1|10

which produces a reconstruction error = 0.

Obviously, large negative samples will be reconstructed with larger errors, but in this case, the HAS will mask them.

OLD. Show recording and playing volume meters

Compute for each received (played) and sent (recorded) chunks the maximum sample and store it in two different shared-memory integer variables. Print both variables at the main process as horizontal bars with a maximum value.

Example:

You                               Other(s)
-quiet----------------------loud- -quiet----------------------loud-
##                                ####
####                              ##
#######                           # 
######                            # 
########                          ##
####                              #######
##                                ####

Transform the transmiter and the receiver in a basic intercom

The transmitter only produces audio and the receiver only consumes audio. Obviously, both elements should do both tasks.

Mix the code of the transmitter and the receiver to build a simple simple_intercom. Basically, what you need to do is to put the original transmitter and receiver elements in different threads, in order to work in parallel.

Implement the simple_intercom using threads and processes, in two different classes. Compare the performance of both alternatives measuring the lost chunks in different CPU and network conditions.

Split the wavelet coefficients in a set of bit-planes

After computing the DWT of a chunk of audio, an array of float coefficients are obtained. Copy the testing_DWT.py module to a new one called testing_bit_plane_representation.py and remove the code that shows the statistics. Modify this code to:

  1. Capture a chunk of audio.
  2. Perform the DWT of the chunk.
  3. Split the array of wavelet coefficients (represented as integers) in a set of 32 bit-planes.
  4. Reconstruct the wavelet coefficients using the set of bit-planes.
  5. Perform the inverse DWT.
  6. Reproduce the chunk of audio.

5. Let's move to the Discrete Wavelet Transform (DWT) domain!

DWT is a mathematical tool for transforming signals into a different representation domain called "the DWT domain".

            This is the signal domain
                        |
    +-------------------+-------------------+
    |                                       |
    v                                       v
 samples  +-----+ coefficients +------+  samples
--------->| DWT |------------->| iDWT |--------->
          +-----+       ^      +------+
                        |
               This is the DWT domain

In the case of audio, the samples are transformed into coefficients. The range of possible values for the DWT coefficients is higher than the original samples (compared to the samples, we will need to use more bit-planes to represent the coefficients) and most of the energy of the signal is accumulated in a small number of coefficients (therefore, transmitting only the most energetic coefficients we can reconstruct a good approximation of the original samples). If the DWT transform is reversible, when all the bit-planes of all the coefficients are transmitted, the reconstructed signal will be identical to the original.

In this issue, we will use PyWavelets for transforming the samples of each chunk (only the channel not processed by the binaural encoding) into coefficients, before transmitting them. After the reception, we will use the inverse (iDWT) transform to recover the original samples, or an approximation of these if some of the bit-planes were missing.

Implementation of a simple audio transmitter and a simple audio receiver

The transmitter is an infinite loop that:

  1. Records a chunk of audio.
  2. Sends the chunk to the receiver.

The receiver is an infinite loop that:

  1. Receive a chunk.
  2. Plays the chunk.

Use UDP as the transport protocol. Define the chunk size as an input parameter from the command line. Use Python and PyAudio. Use CD quality for recording the audio.

modulo toWavelet

Crear el modulo que pasa los datos raw a un arrau secuencial de numpy y transforma audio raw con PyWavelet.

6. Remove the temporal redundancy in the bit-planes using Binary RLE

Bit-planes show temporal redundancy because samples are correlated in time. As a consequence, we can found sequences (runs) of bits all equal to '0' or '1'. A Binary Run Length Encoder (Binary-RLE) can exploit such redundancy to compress the representation of the bit-planes. Thus, for example, the bit-plane:

00000 1 000 1 0 11 00000 1 0 1 0000000

Can be represented by:

4 0 2 0 0 1 4 0 0 0 6
^
|
+--- This is the first code-word of the code-stream

Where each "code-word" represents the length of a run (the number of consecutive bits with the same value) minus one. Represent each code-word using 8 bits. Sent, bit-plane by bit-plane, the shortest representation of the bit-plane (that could be the original representation of the bit-plane, not the Binary-RLE version). Notice that Binary-RLE can be applied recursively.

Compress (and decompress) the mini-chunks (borrame)

Lossless compression of the mini-chunks can help to reduce the bandwidth requirements, incrementing the number of transmitted mini-chunk.

Implement a new version of the intercom: progressive_compress_intercom, where the mini-chunks are compressed. Try (at least) the following "text" compressors:

  1. Run-length encoding.
  2. Huffman.
  3. Lempel-Ziv (or any variation of this technique).

And decide (in terms of compression) the most efficient alternative.

Transform and de-transform audio

Create a Python module (named testing_DWT.py) that using PyAudio and PyWavelets, in an infinite loop do:

  1. Capture a chunk of audio.
  2. Print the highest positive and negative sample, and compute the entropy of the chunk
  3. Transform the chunk of audio using a DWT (Discrete Wavelet Transform).
  4. Print the highest positive and negative wavelet coefficient, and compute the entropy of the transformed chunk.
  5. Compute the inverse transform of the transformed chunk.
  6. Print the highest positive and negative sample, and compute the entropy of the chunk
  7. Play the chunk of audio.

Note: use only one channel of audio.

Averiguar cómo capturar audio

Capturar 44100 muestras/segundo, 16 bits/muestra, 2 canales. Escríbanse (por ahora, sólo por comprobar que la captura es correcta) las muestras en disco, sin cabecera usando el endian de la máquina.

Buffer and sort the chunks of audio

In the current implementation of intercom, the chunks are sent in UDP packets (one packet per chunk) that can be lost or shuffled by the network. Lost chunks cannot be recovered, but we can address the network shuffling "problem".

One technique to put the chunks in the right order is to transmit each chunk of audio with a chunk number (so, the structure of a packet would be <chunk number, chunk of audio>), and use the chunk number to sort the chunks of audio in the buffer.

Notice that those buffer cells that are not filled with a (lost) chunk should remain empty (an array of zeros) in the buffer. Zeros do not produce sound when played.

Averiguar como realizar la transformada y comprimir el audio.

Pues como dice el título mismo, para orientar un poco a los que no pueden venir a clase. Ya tenemos como grabar, reprodur y enviar audio en tiempo real, así que por ahora lo siguiente sería a esa entrada de audio aplicarle la transformada "x" veces y comprimir el audio antes de enviarlo al receptor.
Para hacerlo, estamos investigando sobre la librería PyWavelets.

Splitting in bitplanes

In each chunk. the soundcard returns an array of samples (mono) or an array of pairs of samples (stereo) with a number of bits/sample (normally, 16 bits). In the current version of intercom, each chunk is packed and sent in a UDP packet.

This issue proposes to split each chunk into a sequence of 16 bitplanes and transmit each one in a different packet. The bit-planes must be selected from the MSbP (Most Significant bit Plane) to the LSbP (Least Significant bit Plane). The receiver must reconstruct the original samples before playing them.

Transmit first the bitplanes of each channel interleaving them (most significant bit-plane of t the left channel and next, the most significant bit-plane of the right channel, and so on).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.