tecnologias-multimedia / intercom Goto Github PK
View Code? Open in Web Editor NEWReal-time intercom(municator)
Real-time intercom(municator)
The current version of the code is not able to handle the loss of packets because the number of received packets is hardcoded. Even using a parameter, the problem continues because of the number of lost packets is unpredictable.
Restructure the code following the schema indicated at https://github.com/Tecnologias-multimedia/intercom/blob/master/README.md (intercom without data-flow control).
The Human Auditory System (HAS) is more sensitive to the low amplitude sounds (loud audio signals do carry less auditory information than quiet signals).
Audio signals are represented with positive and negative samples. Negative samples use 2's complement binary representation. Thus, for example:
-1|10 = 1111 1111 1111 1111|2
(-1 in decimal is represented by 16 ones in the 2's complement representation)
The partial reconstruction of positive samples when we suppose that the missing (not received) bit-planes are 0 is right. If the original sample is small and the less significant bit-planes are not received, the reconstruction is correct. However, if the transmitted sample is negative, and we suppose that the not received bit-planes are 0, we will generate a big reconstruction error. For example, if we transmit the 2 more significant bit-planes of -1|10, we will reconstruct the number:
1100 0000 0000 0000|2 = -16384|10
This problem can be addressed using different techniques. One is to work with the sign-magnitude representation of the samples. Thus, the sample -1|10 should be represented by:
-1|10 = 1000 0000 0000 0001|2
and if this sample is partially transmitted (using only 2 bit-planes), we would obtain:
1000 0000 0000 0000|2 = -0|10
which generates a small reconstruction error.
Another possibility is to suppose that the unknown bit-planes of the negative samples are all 1, when we know that the sample is negative. Thus, if we receive only the most significant bit-planes of the sample -1|10, we get:
1111 1111 1111 1111|2 = -1|10
which produces a reconstruction error = 0.
Obviously, large negative samples will be reconstructed with larger errors, but in this case, the HAS will mask them.
In the current implementation of intercom, the chunks are sent in UDP packets (one packet per chunk) that can be lost or shuffled by the network. Lost chunks cannot be recovered, but we can address the network shuffling "problem".
One technique to put the chunks in the right order is to transmit each chunk of audio with a chunk number (so, the structure of a packet would be <chunk number, chunk of audio>), and use the chunk number to sort the chunks of audio in the buffer.
Notice that those buffer cells that are not filled with a (lost) chunk should remain empty (an array of zeros) in the buffer. Zeros do not produce sound when played.
In each chunk. the soundcard returns an array of samples (mono) or an array of pairs of samples (stereo) with a number of bits/sample (normally, 16 bits). In the current version of intercom, each chunk is packed and sent in a UDP packet.
This issue proposes to split each chunk into a sequence of 16 bitplanes and transmit each one in a different packet. The bit-planes must be selected from the MSbP (Most Significant bit Plane) to the LSbP (Least Significant bit Plane). The receiver must reconstruct the original samples before playing them.
Transmit first the bitplanes of each channel interleaving them (most significant bit-plane of t the left channel and next, the most significant bit-plane of the right channel, and so on).
issue#23 has splitted the wavelet coefficients (represented as integers of 32 bits) into 32 bit-planes. The number of levels of the DWT is fixed to 5.
In this issue, modify the code of issue#23 to give to the user the option of selecting the number of levels. Introduce this value from the command-line. A good Python package to handle command-line arguments is https://docs.python.org/3/library/argparse.html
As a straightforward of this issue is to use also argparse to introduce the rest of the arguments to the intercom, such as the chunk size, the sampling rate and the number of channels.
Crear el modulo que pasa los datos raw a un arrau secuencial de numpy y transforma audio raw con PyWavelet.
Compute for each received (played) and sent (recorded) chunks the maximum sample and store it in two different shared-memory integer variables. Print both variables at the main process as horizontal bars with a maximum value.
Example:
You Other(s)
-quiet----------------------loud- -quiet----------------------loud-
## ####
#### ##
####### #
###### #
######## ##
#### #######
## ####
Capturar 44100 muestras/segundo, 16 bits/muestra, 2 canales. Escríbanse (por ahora, sólo por comprobar que la captura es correcta) las muestras en disco, sin cabecera usando el endian de la máquina.
Lossless compression of the mini-chunks can help to reduce the bandwidth requirements, incrementing the number of transmitted mini-chunk.
Implement a new version of the intercom: progressive_compress_intercom
, where the mini-chunks are compressed. Try (at least) the following "text" compressors:
And decide (in terms of compression) the most efficient alternative.
Packets can be lost because of the link between two (or more) interlocutors is congested. A simple algorithm to reduce the congestion is to send to the interlocutor the number of packets that arrived on time.
As can be seen in line 2.1 of the Readme, the player understands the media and can control the flow received from the receiver. Besides, the sender process also is (data-flow) controlled. This means that the buffer of quality layers should keep the fullness level over time.
The size of the buffer is configured by the user. So given a buffer size, the number those packets that cannot be buffered (because of never arrived or arrived late) can be considered to request to the interlocutor a smaller number of quality layers.
Network congestion (and sometimes transmission errors) can generate a loss of the packets sent by intercom. The contribution of the content of the 32 packets/chunk sent by intercom is not the same (the most significant bit-planes are more important than the least significant). Define a data-flow algorithm for controlling that, if packet loss appears:
(1) The most important packets are transmitted before the least important.
(2) Network congestion is reduced, and if possible, eliminated.
Capturar a la resolución nativa de la cámara (la máxima resolución), y en color. Genérese un archivo con la secuencia de imágenes capturadas en formato PPM (P6).
Left and right channels are quite similar (sometimes, identical). Encode the right channel as the sample-by-sample difference between the left and the right channel. In other words, compute
R = R - L
and in the receiver intercom, restore the original right channel with:
R = R + L
The transmitter
only produces audio and the receiver
only consumes audio. Obviously, both elements should do both tasks.
Mix the code of the transmitter
and the receiver
to build a simple simple_intercom
. Basically, what you need to do is to put the original transmitter
and receiver
elements in different threads, in order to work in parallel.
Implement the simple_intercom
using threads and processes, in two different classes. Compare the performance of both alternatives measuring the lost chunks in different CPU and network conditions.
In the current code, the method create_subbands
copies one to one the coefficients into a buffer, forming a subband, and then appends the subband to the subbands list. Speed up the copy using slicing.
Pues como dice el título mismo, para orientar un poco a los que no pueden venir a clase. Ya tenemos como grabar, reprodur y enviar audio en tiempo real, así que por ahora lo siguiente sería a esa entrada de audio aplicarle la transformada "x" veces y comprimir el audio antes de enviarlo al receptor.
Para hacerlo, estamos investigando sobre la librería PyWavelets.
The transmitter
is an infinite loop that:
receiver
.The receiver
is an infinite loop that:
Use UDP as the transport protocol. Define the chunk size as an input parameter from the command line. Use Python and PyAudio. Use CD quality for recording the audio.
DWT is a mathematical tool for transforming signals into a different representation domain called "the DWT domain".
This is the signal domain
|
+-------------------+-------------------+
| |
v v
samples +-----+ coefficients +------+ samples
--------->| DWT |------------->| iDWT |--------->
+-----+ ^ +------+
|
This is the DWT domain
In the case of audio, the samples are transformed into coefficients. The range of possible values for the DWT coefficients is higher than the original samples (compared to the samples, we will need to use more bit-planes to represent the coefficients) and most of the energy of the signal is accumulated in a small number of coefficients (therefore, transmitting only the most energetic coefficients we can reconstruct a good approximation of the original samples). If the DWT transform is reversible, when all the bit-planes of all the coefficients are transmitted, the reconstructed signal will be identical to the original.
In this issue, we will use PyWavelets for transforming the samples of each chunk (only the channel not processed by the binaural encoding) into coefficients, before transmitting them. After the reception, we will use the inverse (iDWT) transform to recover the original samples, or an approximation of these if some of the bit-planes were missing.
Create a Python module (named testing_DWT.py
) that using PyAudio and PyWavelets, in an infinite loop do:
Note: use only one channel of audio.
Those bit-planes where all the bits are '0' must not be sent.
After computing the DWT of a chunk of audio, an array of float coefficients are obtained. Copy the testing_DWT.py
module to a new one called testing_bit_plane_representation.py
and remove the code that shows the statistics. Modify this code to:
In the main process, there is an infinite loop that prints the number of sent and received chunks at the sender and receiver processes. Include a new parameter for controlling the time that the main process sleeps between consecutive prints.
Bit-planes show temporal redundancy because samples are correlated in time. As a consequence, we can found sequences (runs) of bits all equal to '0' or '1'. A Binary Run Length Encoder (Binary-RLE) can exploit such redundancy to compress the representation of the bit-planes. Thus, for example, the bit-plane:
00000 1 000 1 0 11 00000 1 0 1 0000000
Can be represented by:
4 0 2 0 0 1 4 0 0 0 6
^
|
+--- This is the first code-word of the code-stream
Where each "code-word" represents the length of a run (the number of consecutive bits with the same value) minus one. Represent each code-word using 8 bits. Sent, bit-plane by bit-plane, the shortest representation of the bit-plane (that could be the original representation of the bit-plane, not the Binary-RLE version). Notice that Binary-RLE can be applied recursively.
Los threads esperan recibir una tupla como argumento y lo que ocurre al pasarles el puerto con mas de un digito como cadena es que se discomponen en una tupla por digitos. Ej "4004" -> pasa a ser ["4", "0","0","4"]
Revisar como ocurre exactamente e investigar las posibles soluciones.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.