tecnologias-multimedia / intercom Goto Github PK

Real-time intercom(municator)

Python 93.01% Shell 2.02% Jupyter Notebook 4.98%

intercom's Introduction

InterCom

InterCom is a low-latency full-duplex intercom(municator) designed for the transmission of media (at this moment, only audio) between networked users. It is implemented in Python and designed as a set of layers that provide an incremental functionality, following a multilevel (one-to-one) inheritance model:

minimal: records/plays raw (CD quality) audio, and sends/receives the chunks of audio to/from another intercom instance.
buffer: delays the playing of chunks to hide the network jitter.
DEFLATE*: uses DEFLATE to compress the chunks.
BR_control*: uses quantization to control the transmission bit-rate.
stereo_MST_coding*: removes spatial (inter-channel) redundancy.
temporal_coding*: removes temporal (intra-channel) redundancy.
basic_ToH: removes phycho-acoustic redundancy generated by the expected threshold of hearing.

intercom's People

Contributors

Stargazers

Watchers

Forkers

jarh57 losmanolos cpr382 mohahnina mgr1984 jmmateo14 daniellopezmorales prr425 cah758 javierta97 sergio-segura-garcia antoniojesu emilio-jose-morales-bedmar nectorosales gervillaesphoto silvialmodovar alcoiz paolada lala930 pedrolula6 sergydev elvirasanjose aat468 jgc412 javiervaleromoreno meoko97 gitter-badger andyleeqd floriantimmler ostrowskaew martynamudrak djcluster jokejuan juanjomontoya antoniojesusgon zeusrm1 monikazemankiewicz sergioamate11 chibo1993 jgt627 javilo28 peter-98 lhb636 antoniolopezgarcia12198 alexcrespo23 primaldialga2 soulbro1998 kintionix pmg617 aos465 cub5 albertobaraza manuelforondaescudero ualaac450 aczerwien nachodevs annejasmijn csg505 bymagg amg813 jdo426 bfl699 hajaraka jtitos23 supercrispy acm92 jesusss949 rborgese smvg idj559 aaronblanco cobeguel cristiandc27 josefmg avt276 r4gster rubenrcv laroga michaeleri arcostm alvarogar23 alvaro-ms irenedl ams935 fga432 raquelgg vicente-gonzalez-ruiz hamzaelfallah rramostristan javic9 davidvaldiviaaguilera mpmanuel98 pcoloc arocalo rtyui5 luzlopez376 acc166 aml743 selugc4 aitorcarreno11

intercom's Issues

5. Provide data-flow control

Packets can be lost because of the link between two (or more) interlocutors is congested. A simple algorithm to reduce the congestion is to send to the interlocutor the number of packets that arrived on time.

As can be seen in line 2.1 of the Readme, the player understands the media and can control the flow received from the receiver. Besides, the sender process also is (data-flow) controlled. This means that the buffer of quality layers should keep the fullness level over time.

The size of the buffer is configured by the user. So given a buffer size, the number those packets that cannot be buffered (because of never arrived or arrived late) can be considered to request to the interlocutor a smaller number of quality layers.

Working with a variable number of levels of the DWT

issue#23 has splitted the wavelet coefficients (represented as integers of 32 bits) into 32 bit-planes. The number of levels of the DWT is fixed to 5.

In this issue, modify the code of issue#23 to give to the user the option of selecting the number of levels. Introduce this value from the command-line. A good Python package to handle command-line arguments is https://docs.python.org/3/library/argparse.html

As a straightforward of this issue is to use also argparse to introduce the rest of the arguments to the intercom, such as the chunk size, the sampling rate and the number of channels.

OLD. Add a new parameter for controlling the frequency of the output feedback at the main process

In the main process, there is an infinite loop that prints the number of sent and received chunks at the sender and receiver processes. Include a new parameter for controlling the time that the main process sleeps between consecutive prints.

Corregir el bug de la seleccion del puerto

Los threads esperan recibir una tupla como argumento y lo que ocurre al pasarles el puerto con mas de un digito como cadena es que se discomponen en una tupla por digitos. Ej "4004" -> pasa a ser ["4", "0","0","4"]
Revisar como ocurre exactamente e investigar las posibles soluciones.

Capture the packets generated by intercom

Issue #26 implements that for each chunk of audio the emitter intercom sends 32 UDP packets (one for each bit-plane). Using Wireshark, capture the packets generated by a chunk of audio (after performing DWT) and analyze them.

2. Data-flow control and priorization

Network congestion (and sometimes transmission errors) can generate a loss of the packets sent by intercom. The contribution of the content of the 32 packets/chunk sent by intercom is not the same (the most significant bit-planes are more important than the least significant). Define a data-flow algorithm for controlling that, if packet loss appears:

(1) The most important packets are transmitted before the least important.
(2) Network congestion is reduced, and if possible, eliminated.

1. Remove binaural redundancy

Left and right channels are quite similar (sometimes, identical). Encode the right channel as the sample-by-sample difference between the left and the right channel. In other words, compute

R = R - L

and in the receiver intercom, restore the original right channel with:

R = R + L

Averiguar cómo capturar vídeo

Capturar a la resolución nativa de la cámara (la máxima resolución), y en color. Genérese un archivo con la secuencia de imágenes capturadas en formato PPM (P6).

3. Minimize the reconstruction error

The Human Auditory System (HAS) is more sensitive to the low amplitude sounds (loud audio signals do carry less auditory information than quiet signals).

Audio signals are represented with positive and negative samples. Negative samples use 2's complement binary representation. Thus, for example:

-1|10 = 1111 1111 1111 1111|2

(-1 in decimal is represented by 16 ones in the 2's complement representation)

The partial reconstruction of positive samples when we suppose that the missing (not received) bit-planes are 0 is right. If the original sample is small and the less significant bit-planes are not received, the reconstruction is correct. However, if the transmitted sample is negative, and we suppose that the not received bit-planes are 0, we will generate a big reconstruction error. For example, if we transmit the 2 more significant bit-planes of -1|10, we will reconstruct the number:

1100 0000 0000 0000|2 = -16384|10

This problem can be addressed using different techniques. One is to work with the sign-magnitude representation of the samples. Thus, the sample -1|10 should be represented by:

-1|10 = 1000 0000 0000 0001|2

and if this sample is partially transmitted (using only 2 bit-planes), we would obtain:

1000 0000 0000 0000|2 = -0|10

which generates a small reconstruction error.

Another possibility is to suppose that the unknown bit-planes of the negative samples are all 1, when we know that the sample is negative. Thus, if we receive only the most significant bit-planes of the sample -1|10, we get:

1111 1111 1111 1111|2 = -1|10

which produces a reconstruction error = 0.

Obviously, large negative samples will be reconstructed with larger errors, but in this case, the HAS will mask them.

OLD. Show recording and playing volume meters

Compute for each received (played) and sent (recorded) chunks the maximum sample and store it in two different shared-memory integer variables. Print both variables at the main process as horizontal bars with a maximum value.

Example:

You                               Other(s)
-quiet----------------------loud- -quiet----------------------loud-
##                                ####
####                              ##
#######                           # 
######                            # 
########                          ##
####                              #######
##                                ####

Transform the transmiter and the receiver in a basic intercom

The transmitter only produces audio and the receiver only consumes audio. Obviously, both elements should do both tasks.

Mix the code of the transmitter and the receiver to build a simple simple_intercom. Basically, what you need to do is to put the original transmitter and receiver elements in different threads, in order to work in parallel.

Implement the simple_intercom using threads and processes, in two different classes. Compare the performance of both alternatives measuring the lost chunks in different CPU and network conditions.

Split the wavelet coefficients in a set of bit-planes

After computing the DWT of a chunk of audio, an array of float coefficients are obtained. Copy the testing_DWT.py module to a new one called testing_bit_plane_representation.py and remove the code that shows the statistics. Modify this code to:

Capture a chunk of audio.
Perform the DWT of the chunk.
Split the array of wavelet coefficients (represented as integers) in a set of 32 bit-planes.
Reconstruct the wavelet coefficients using the set of bit-planes.
Perform the inverse DWT.
Reproduce the chunk of audio.

5. Let's move to the Discrete Wavelet Transform (DWT) domain!

DWT is a mathematical tool for transforming signals into a different representation domain called "the DWT domain".

            This is the signal domain
                        |
    +-------------------+-------------------+
    |                                       |
    v                                       v
 samples  +-----+ coefficients +------+  samples
--------->| DWT |------------->| iDWT |--------->
          +-----+       ^      +------+
                        |
               This is the DWT domain

In the case of audio, the samples are transformed into coefficients. The range of possible values for the DWT coefficients is higher than the original samples (compared to the samples, we will need to use more bit-planes to represent the coefficients) and most of the energy of the signal is accumulated in a small number of coefficients (therefore, transmitting only the most energetic coefficients we can reconstruct a good approximation of the original samples). If the DWT transform is reversible, when all the bit-planes of all the coefficients are transmitted, the reconstructed signal will be identical to the original.

In this issue, we will use PyWavelets for transforming the samples of each chunk (only the channel not processed by the binaural encoding) into coefficients, before transmitting them. After the reception, we will use the inverse (iDWT) transform to recover the original samples, or an approximation of these if some of the bit-planes were missing.

3. Vectorize the generation of the list of subbands

In the current code, the method create_subbands copies one to one the coefficients into a buffer, forming a subband, and then appends the subband to the subbands list. Speed up the copy using slicing.

Implementation of a simple audio transmitter and a simple audio receiver

The transmitter is an infinite loop that:

Records a chunk of audio.
Sends the chunk to the receiver.

The receiver is an infinite loop that:

Receive a chunk.
Plays the chunk.

Use UDP as the transport protocol. Define the chunk size as an input parameter from the command line. Use Python and PyAudio. Use CD quality for recording the audio.

modulo toWavelet

Crear el modulo que pasa los datos raw a un arrau secuencial de numpy y transforma audio raw con PyWavelet.

4. Restructure the code to adapt to a loss of packets

The current version of the code is not able to handle the loss of packets because the number of received packets is hardcoded. Even using a parameter, the problem continues because of the number of lost packets is unpredictable.

Restructure the code following the schema indicated at https://github.com/Tecnologias-multimedia/intercom/blob/master/README.md (intercom without data-flow control).

6. Remove the temporal redundancy in the bit-planes using Binary RLE

Bit-planes show temporal redundancy because samples are correlated in time. As a consequence, we can found sequences (runs) of bits all equal to '0' or '1'. A Binary Run Length Encoder (Binary-RLE) can exploit such redundancy to compress the representation of the bit-planes. Thus, for example, the bit-plane:

00000 1 000 1 0 11 00000 1 0 1 0000000

Can be represented by:

4 0 2 0 0 1 4 0 0 0 6
^
|
+--- This is the first code-word of the code-stream

Where each "code-word" represents the length of a run (the number of consecutive bits with the same value) minus one. Represent each code-word using 8 bits. Sent, bit-plane by bit-plane, the shortest representation of the bit-plane (that could be the original representation of the bit-plane, not the Binary-RLE version). Notice that Binary-RLE can be applied recursively.

Compress (and decompress) the mini-chunks (borrame)

Lossless compression of the mini-chunks can help to reduce the bandwidth requirements, incrementing the number of transmitted mini-chunk.

Implement a new version of the intercom: progressive_compress_intercom, where the mini-chunks are compressed. Try (at least) the following "text" compressors:

Run-length encoding.
Huffman.
Lempel-Ziv (or any variation of this technique).

And decide (in terms of compression) the most efficient alternative.

Transform and de-transform audio

Create a Python module (named testing_DWT.py) that using PyAudio and PyWavelets, in an infinite loop do:

Capture a chunk of audio.
Print the highest positive and negative sample, and compute the entropy of the chunk
Transform the chunk of audio using a DWT (Discrete Wavelet Transform).
Print the highest positive and negative wavelet coefficient, and compute the entropy of the transformed chunk.
Compute the inverse transform of the transformed chunk.
Print the highest positive and negative sample, and compute the entropy of the chunk
Play the chunk of audio.

Note: use only one channel of audio.

Averiguar cómo capturar audio

Capturar 44100 muestras/segundo, 16 bits/muestra, 2 canales. Escríbanse (por ahora, sólo por comprobar que la captura es correcta) las muestras en disco, sin cabecera usando el endian de la máquina.

Buffer and sort the chunks of audio

In the current implementation of intercom, the chunks are sent in UDP packets (one packet per chunk) that can be lost or shuffled by the network. Lost chunks cannot be recovered, but we can address the network shuffling "problem".

One technique to put the chunks in the right order is to transmit each chunk of audio with a chunk number (so, the structure of a packet would be <chunk number, chunk of audio>), and use the chunk number to sort the chunks of audio in the buffer.

Notice that those buffer cells that are not filled with a (lost) chunk should remain empty (an array of zeros) in the buffer. Zeros do not produce sound when played.

4. "Empty" bit-planes must not be sent

Those bit-planes where all the bits are '0' must not be sent.

Transmit the set of bitplanes between intercoms

Averiguar como realizar la transformada y comprimir el audio.

Pues como dice el título mismo, para orientar un poco a los que no pueden venir a clase. Ya tenemos como grabar, reprodur y enviar audio en tiempo real, así que por ahora lo siguiente sería a esa entrada de audio aplicarle la transformada "x" veces y comprimir el audio antes de enviarlo al receptor.
Para hacerlo, estamos investigando sobre la librería PyWavelets.

Splitting in bitplanes

In each chunk. the soundcard returns an array of samples (mono) or an array of pairs of samples (stereo) with a number of bits/sample (normally, 16 bits). In the current version of intercom, each chunk is packed and sent in a UDP packet.

This issue proposes to split each chunk into a sequence of 16 bitplanes and transmit each one in a different packet. The bit-planes must be selected from the MSbP (Most Significant bit Plane) to the LSbP (Least Significant bit Plane). The receiver must reconstruct the original samples before playing them.

Transmit first the bitplanes of each channel interleaving them (most significant bit-plane of t the left channel and next, the most significant bit-plane of the right channel, and so on).

tecnologias-multimedia / intercom Goto Github PK

intercom's Introduction

InterCom

intercom's People

Contributors

Stargazers

Watchers

Forkers

intercom's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs