Bit, information entropy of Shannon and Hamming code. How to measure any information and transfer it without loss. Information entropy Determination of entropy in terms of information theory

1.4 source entropy. Properties of the number of information and entropy

The amount of information contained in one elementary message x I. , does not fully characterize the source. The source of discrete messages can be characterized by the average number of information coming to one elementary message. wearing the name entropy source

, i. =1…k. , (1.3)

where k. - The volume of the alphabet of messages.

Thus, entropy is an average measure of the uncertainty of knowledge of the recipient of information regarding the state of the observed object.

In expression (1.3), statistical averaging (i.e., the definition of mathematical expectation of a discrete random variable I. (X I. )) It is performed throughout the source message ensemble. At the same time, it is necessary to take into account all the probabilistic links between messages. The higher the source entropy, the more information on average is laid in each message, the more difficult to remember (write) or transfer such a message to the communication channel. Thus, the essence of the entropy of Shannon is as follows: the entropy of discrete random variables is the minimum of the average bits that need to be transmitted through the current value of this random variable.

The necessary energy costs for the transmission of the message are proportional to entropy (the average number of information on the message). It follows that the amount of information in the sequence of N. Messages are determined by the number of these messages and the entropy of the source, i.e.

I. (N.)=NH.(X.) .

Entropy as a quantitative measure of the information content of the source has the following properties:

1) entropy is zero, if at least one of the messages reliably (i.e., it is likely p I. = 1);

2) The value of entropy is always greater than or equal to zero, valid and limited;

3) the entropy of the source with two alternative events may vary from 0 to 1;

4) entropy - the additive value: the entropy of the source, whose messages consist of several statistically independent sources reports, is equal to the sum of entropy of these sources;

5) entropy will be maximum if all messages are equally

. (1.4)

For non-uniform messages x. I. Entropy decreases. In this connection, such a measure of the source is introduced such as the statistical redundancy of the source alphabet

, (1.5)

where H. (X. ) - entropy of a real source; H. (X. ) Max= log. 2 k. - Maximum reacing the entropy of the source.

The redundancy of the source of information defined by the formula (1.5) is indicative of the information reserves of the messages whose elements are ineveributed.

There is also a concept semantic redundancy which follows from the fact that any idea that is contained in the message from the proposals of the human language can be formulated shorter. It is believed that if any message can be reduced without the loss of its semantic content, then it has semantic redundancy.

Consider discrete random variables (DS) H. and Y. defined distribution laws P. (X. = X I. )= p I. , P. (Y. = Y J. )= q J. and joint distribution P. (X. = X I. , Y. = Y J. )= p IJ. . Then the amount of information contained in the village. in. X relative to village in. Y. , determined by the formula

. (1.6)

For continuous random variables (sl. In.) X. and Y. specified by the density of probability distribution r. X. (t. 1 ) , r. Y. (t. 2 ) and r. XY. (t. 1 , t. 2 ) similar formula has a view

It's obvious that

hence

those. We arrive at expression (1.3) to calculate entropy H. (X. ) .

Properties of the number of information and entropy:

1) I. (X. , Y. ) ≥ 0 ; I. (X. , Y. ) =0 Û X. and Y. independent (one random value does not describe the other);

2) I. (X Y. ) =I.(Y, X. ) ;

3) NH =0 Û X \u003d const ;

4) I. (X, Y.) \u003d HX + HY-H (X, Y.) , where ;

5) I. (X, Y.) ≤ i (x, x); I (x, y) \u003d i (x, x) Þ X \u003d f (y) .

CONTROL QUESTIONS

1 What are the types of information?

2 How to translate continuous information to a discrete (digital) view?

3 What is the frequency of discretization of continuous information?

4 How is the discretization theorem formulad?

5 What is information, encoding, communication channel, noise?

6 What are the main provisions of the probabilistic approach of Shannon to determine the number of information?

7 How is the amount of information contained in one message of the discrete source?

8 How is the amount of information on one message of the source of interdependent messages?

9 What is the source entropy? What are its properties?

10 under what conditions the source entropy is maximum?

11 How is the amount of information determined? What are the properties of the number of information?

12 What is due to the statistical redundancy of the source of information?

"Information is a form of life," wrote the American poet and Essraist John Perry Barlow. Indeed, we are constantly facing the word "information" - it is obtained, transmitted and preserve. Learn the weather forecast or the result of a football match, the content of the film or books, talk on the phone - is always clear, with what kind of information we are dealing with. But what is the information itself, and most importantly, how it can be measured, no one is usually thinking. Meanwhile, the information and methods of its transfer is an important thing that largely defines our life, an integral part of which information technologies have become. The scientific editor of the publishing "Laba.Media" Vladimir Gubailovsky explains what information is as to be measured, and why the most difficult is the transmission of information without distortion.

Space of random events

In 1946, American Statistics Scientific Statistics John Tyuki offered the name of the bit (bit, Binary Digit - "binary number" - "Haytech") is one of the main concepts of the 20th century. Tyuki chose a bit to designate one binary discharge capable of making a value of 0 or 1. Claude Shannon in its program article "Mathematical Communication Theory" suggested measuring the amount of information in bits. But this is not the only concept introduced and studied by Shannon in his article.

Imagine the space of random events, which consists of throwing one false coin, on both sides of which an eagle. When does the eagle fall? It is clear that always. We know this in advance, because our space is so arranged. Eagle falling - a reliable event, that is, its probability is equal to 1. Do we inform much information if we say about the evaporated eagle? Not. The amount of information in this message we will be considered equal to 0.

Now let's throw the right coin: on the one hand she has an eagle, and with another rush, as it should be. Eagle or rush loss will be two different events, of which our space of random events consists. If we inform about the outcome of one throwing, it will really be a new information. When you fall out of the eagle, we will notify 0, and with a grief 1. In order to report this information, we are enough 1 bit.

What changed? Uncertainty appeared in our event space. We have something to tell about the one who himself does not throw and the outcome of the throwing does not see. But in order to correctly understand our message, he must know exactly what we do, which means 0 and 1. Our event spaces must match, and the decoding process is to unambiguously restore the result of throwing. If the event space of the transmitting and receiving does not match or not the possibility of unambiguous decoding of the message, the information will remain only noise in the communication channel.

If independently and at the same time throw two coins, then different equilibrium results will already have four: eagle eagle, eagle-river, eagle rush and rushka. To convey information, we will need 2 bits, and our messages will be: 00, 01, 10 and 11. Information has become twice as much. It happened because the uncertainty grew up. If we try to guess the outcome of such a pair throw, we have two times more chances to make a mistake.

The greater the uncertainty of the space of events, the more information contains a message about its condition.

Slightly complicate our space of events. While all events that happened were equally concerned. But in real spaces, not all events have an equal chance. Let's say, the likelihood that the crow that we saw will be black, close to 1. The likelihood that the first passersogue met on the street will turn out to be about 0.5. But a crocodile is almost incredible on Moscow Street. Intuitively understand that a meeting with a crocodile has a much greater informational value than about black ravene. The lower the likelihood of an event, the more information in the report of such an event.

Let the space of events are not so exotic. We just stand at the window and look at passing cars. Moved cars of four colors that we need to report. To do this, we encode colors: black - 00, white - 01, red - 10, blue - 11. To report exactly which car drove, it is enough to transfer 2 bits of information.

But after a long time watching cars, we notice that the color of cars is unevenly distributed: black - 50% (every second), white - 25% (every fourth), red and blue - 12.5% \u200b\u200b(every eight). Then you can optimize the transmitted information.

Most of all black cars, so we denote black - 0 - the shortest code, and the code of all the others be starts at 1. From the remaining half-white - 10, and the remaining colors start at 11. In conclusion, we denote the red - 110, and blue - 111.

Now, passing information about the color of cars, we can encode it denser.

Entropy on Shannonu

Let our space of events consist of n different events. When throwing coins with two eagles, such an event is exactly one, when throwing one correct coin - 2, when throwing two coins or observation of cars - 4. Each event corresponds to the likelihood of its occurrence. When throwing a coin with two eagles, an event (Eagle loss) is one and its probability P1 \u003d 1. When throwing the correct coin of events, two, they are equal and the likelihood of each - 0.5: p1 \u003d 0.5, p2 \u003d 0.5. When throwing two right coins of events, four, all of them are equal and the probability of each - 0.25: p1 \u003d 0.25, p2 \u003d 0.25, p3 \u003d 0.25, p4 \u003d 0.25. When observing cars of events, four, and they have different probabilities: black - 0.5, white - 0.25, red - 0.125, blue - 0.125: p1 \u003d 0.5, p2 \u003d 0.25, p3 \u003d 0,125, p4 \u003d 0.125.

This is not a random coincidence. Shannon so picked up entropy (measure of uncertainty in the space of events) so that three conditions were performed:

1Entropy of a reliable event, the probability of which 1 is equal to 0.

Entropy of two independent events is equal to the sum of entropy of these events.

Entropy is maximal if all events are equal.

All these requirements are fully consistent with our ideas about the uncertainty of the event space. If the event is one (first example) - no uncertainty. If the events are independent - the uncertainty of the amount is equal to the amount of uncertainties - they simply fold (an example with throwing two coins). And finally, if all events are equally even, the degree of uncertainty of the system is maximum. As in the case of throwing two coins, all four events are equally equal to the entropy equal to 2, it is greater than in the case of cars, when events are also four, but they have different probability - in this case entropy 1.75.

The value of H plays a central role in the theory of information as a measure of the amount of information, the possibility of selection and uncertainty.

Claude Shannon

Claude Elwood Shannon - American engineer, cryptoanalyst and mathematician. It is considered the "father of the information age." Founder of the theory of information that has been used in modern high-tech communication systems. Provided the fundamental concepts, ideas and their mathematical formulations, which currently form the basis for modern communication technologies.

In 1948, he proposed to use the word "bit" to indicate the smallest unit of information. He also demonstrated that entropy entered by them is equivalent to the extentness of the information uncertainty in the transmitted message. Articles of Shannon "Mathematical Communication Theory" and "Theory of Communications in Secret Systems" are considered fundamental to the theory of information and cryptography.

During World War II, Shannon in Bell Laboratories has been developing cryptographic systems, later it helped it to open coding methods with error correction.

Shannon made a key contribution to the theory of probabilistic schemes, the theory of games, the theory of automata and the theory of management systems - the sciences area included in the concept of "cybernetics".

Coding

And the thrown coins, and passing cars are not like the numbers 0 and 1. To report events occurring in spaces, you need to think of the way to describe these events. This description is called coding.

Coding messages can be infinite number of different ways. But Shannon showed that the shortest code cannot be less in bits than entropy.

That is why the entropy of the message is the measure of information in the message. Since in all cases considered the number of bits when encoding is equal to entropy - it means the coding has passed optimally. In short, encode messages about events in our spaces no longer.

With optimal encoding, you cannot lose or distort the message in any transmitted bit. If at least one bit is lost, information will be distorted. But all the real communication channels do not give 100 percent confidence that all bits of messages reach the recipient undistorted.

To eliminate this problem, it is necessary to make the code not optimal, but excessive. For example, transmit together with the message of its checksum - a specially calculated value obtained when converting the message code, and which can be checked by recalculating the message. If the transmitted checksum coincides with the calculated, the likelihood that the transfer passed without errors will be quite high. And if the checksum does not match, then you need to request a retransmission. This is how most of the communication channels are already working today, for example, when transferring information on the Internet.

Natural languages

Consider the space of events that consists of reports in a natural language. This is a special case, but one of the most important. Events here will be transmitted symbols (the letters of a fixed alphabet). These characters are encountered in a language with a different probability.

The most frequency symbol (that is, that is most often found in all texts written in Russian) is a space: from a thousand characters on average, a gap is 175 times. The second frequency is the symbol "O" - 90, then there are other vowels: "e" (or "" - we will not distinguish them) - 72, "A" - 62, "and" - 62, and only happens The first consonant "T" - 53. And the most rare "f" - this symbol is found only twice a thousand characters.

We will use the 31-letter alphabet of the Russian language (it does not differ in it and "е" and "ъ" and "b"). If all the letters occurred in the language with the same probability, the entropy on the character would be H \u003d 5 bits, but if we take into account the real frequencies of characters, then entropy will be less: H \u003d 4.35 bits. (This is almost two times less than with traditional coding, when the symbol is transmitted as byte - 8 bits).

But the entropy of the symbol in the language is even lower. The probability of the appearance of the next character is not completely predetermined by the average frequency of the symbol in all texts. What a symbol will follow depends on the characters already transmitted. For example, in modern Russian after the "Kommersant" symbol, a symbol of consonant sound may not follow. After two contracts of vowels "E", the third vowel "e" should be extremely rare, unless in the word "long". That is, the following symbol is somewhat predetermined. If we take into account such predetermines of the next symbol, uncertainty (that is, the information) of the next character will be even less than 4.35. According to some estimates, the following symbol in Russian is predetermined by the structure of the language by more than 50%, that is, with optimal coding, all the information can be transferred by crossing half the letters from the message.

Another thing is that not every letter can be painted painlessly. High-frequency "O" (and generally vowels), for example, delete easily, but rare "F" or "E" is quite problematic.

The natural language in which we communicate with each other is highly redundant, and therefore reliable if we are not happy about something - it will be unstable, the information will still be transferred.

But while Shannon did not enter the measure of information, we could not understand what language is excessive, and to what extent we can compress the messages (and why text files are so well compressed by the archiver).

Redundancy of a natural language

In the article "On how we agitating the TCT" (the name sounds like that!) A fragment of the novel of Ivan Turgenev "Noble Nest" was taken and subjected to some transformation: 34% of letters were crossed out of the fragment, but not random. The first and recent letters were left in words, only vowels were drawn out, and not all. The goal was not easy to be able to restore all the information on the transformed text, but also to ensure that the person reading this text has not experienced special difficulties due to letters of letters.

Why is it relatively easy to read this spoiled text? It really contains the necessary information to restore entire words. The carrier of the Russian language has a certain set of events (words and entire proposals), which it uses when recognizing. In addition, there are also standard language structures at the disposal of the media that help him restore information. For example, "She is blabe chwerious" - with a high probability you can read as "She was more sensitive". But taken separate phrase "She's blah"rather will be restored as "She was whiter". Since we are dealing with channels in everyday communication, in which there are noise and interference, we are quite well able to restore information, but only the one we already know in advance. For example, phrase "Her Does her nice is nicer, Htya NMGO RSPHLI and SPULTC" well read except last words "Spell" - "swore". This word is not in modern lexicon. With quick reading word "Spell" It is read more like "merged", with slow - just puts in a dead end.

Video digitization

Sound, or acoustic oscillations is a sinusoid. This can be seen, for example, on the sound editor screen. To accurately convey the sound, you will need an infinite number of values \u200b\u200b- the whole sinusoid. This is possible with analog connection. He sings - you are listening, contact is not interrupted while the song lasts.

When digital communication via channel, we can only pass the final number of values. Does this mean that the sound can be transmitted for sure? It turns out no.

Different sounds are differently modulated sinusoid. We only transmit discrete values \u200b\u200b(frequencies and amplitudes), and the sinusoid itself is not necessary to transmit - the receiving device can be generated. It gives rise to a sinusoid, and modulation is superimposed on it, created by the values \u200b\u200btransmitted over the communication channel. There are accurate principles exactly which discrete values \u200b\u200bshould be transmitted so that the sound at the input to the communication channel coincided with the sound at the output, where these values \u200b\u200bare superimposed on some standard sinusoid (about it just the Kotelnikov theorem).

Kotelnikov Theorem (in English Literature - Nyquist - Shannon Theorem, countdown theorem) - Fundamental assertion in the field of digital signal processing, connecting continuous and discrete signals and a stanking that "any function f (t) consisting of frequencies from 0 to F1 can be continuously transmitted with any accuracy with the number of numbers following each other after 1 / (2 * F1) seconds.

Noise-resistant coding. Hamming codes

If on the unreliable channel to transfer the encoded text of Ivan Turgenev, albeit with some errors, then it will be quite meaningful text. But if we need to transmit everything up to the bit, the task will be unresolved: we do not know which bits are erroneous, because the error is random. Even the checksum does not always save.

That is why today, when transmitting data on networks, they seek not so much to optimal coding, in which the maximum amount of information can be searched into the channel, how much of the errors can be restored - so, approximately, when reading, we restored words in Fragment of Ivan Turgenev.

There are special noise-resistant codes that allow you to restore information after a failure. One of them is the Hamming code. Suppose our entire language consists of three words: 111000, 001110, 100011. These words know the source of the message, and the receiver. And we know that errors occur in the communication channel, but when transferring one word is distorted by no more than one bit of information.

Suppose we first pass the word 111000. As a result, no more than one error (we allocated errors) it can turn into one of the words:

1) 111000, 0 11000, 10 1000, 110 000, 1111 00, 11101 0, 111001 .

When transferring words 001110 any of the words:

2) 001110, 1 01110, 01 1110, 000 110, 0010 10, 00110 0, 001111 .

Finally, for 100011, we can get at the reception:

3) 100011, 0 00011, 11 0011, 101 011, 1001 11, 10000 1, 100010 .

Note that all three lists do not intersect in pairs. In other words, if any word from the list 1 appears on the other end of the communication channel, the recipient knows exactly what the word 111000 passed on, and if any word appears from the list 2 - the word 001110, and from the list 3 - the word 100011. In this case It is said that our code corrected one error.

The correction occurred at the expense of two factors. First, the recipient knows the whole "dictionary", that is, the message of the recipient's events of the message coincides with the space of who conveyed the message. When the code was transmitted with just one mistake, the word came out, which in the dictionary was not in the dictionary.

Secondly, words in the dictionary were chosen in a special way. Even when the error occurs, the recipient could not confuse one word with another. For example, if the dictionary consists of the words "daughter", "point", "bump", and when transmitted, the "boring" was obtained, the recipient, knowing that this word does not happen, it would not be able to correct the error - any of the three words may not be Right. If the dictionary includes the "point", "dank", "branch" and we know that no more than one error is allowed, then the "boring" is a deliberately "point", and not "dank". In codes that correct errors, the words are chosen just so that they are "recognizable" even after an error. The only difference is that in the code "alphabet" only two letters - zero and one.

The redundancy of such coding is very large, and the number of words that we can thus convey is relatively small. We need to exclude any word from the dictionary that may match the entire list with a whole list corresponding to the transmitted words (for example, the words "daughter" and "point" could not be in the dictionary). But the exact transfer of the message is so important that large forces are spent on the study of noise-resistant codes.

Sensation

The concepts of entropy (or uncertainty and unpredictability) of communication and redundancy (or predestination and predictability) very naturally comply with our intuitive ideas about the information. The more unpredictable message (the greater its entropy, because less likely), the more information it carries. Sensation (for example, a meeting with a crocodile on Tverskaya) - a rare event, its predictability is very small, and therefore the information value is great. Often the information is called news - reports about the events that we have just happened, about which we still do not know anything. But if we tell us about the second and third time about the same words, the redundancy of the message will be great, its unpredictability will fall to zero, and we will simply do not listen, shook out from the speaker with the words "I know, I know." Therefore, the media is trying so first. This is a compliance with the intuitive feeling of novelty, which gives rise to a really unexpected news, and played the main role in the fact that the article of Shannon, which was completely unaccounted by the mass reader, became a sensation that the press took, which was taken as a universal key to the knowledge of nature, scientists of various specialties - from linguists and literary critics to biologists.

But the concept of information on Shannon is a strict mathematical theoryAnd its use outside the theory of communication is very unreliable. But in the very theory of communication, she plays a central role.

Semantic information

Shannon, introducing the concept of entropy as information measures, was able to work with information - first of all, it is measured and evaluating such characteristics as channel bandwidth or encoding optimality. But the main assumption that has allowed Shannon to successfully operate with the information was the assumption that the generation of information is a random process that can be successfully described in terms of probability theory. If the process is non-random, that is, it obeys the patterns (besides, it is not always clear, as it happens in a natural language), then the reasoning of Shannon is not applicable. Everything that Shannon says, is in no way connected with the meaningfulness of the information.

While we are talking about the symbols (or the letters of the alphabet), we may well argue in terms of random events, but as soon as we proceed to the words of the language, the situation will change dramatically. Speech is a process, in a special way organized, and here the message structure is no less important than the symbols that it is transmitted.

Recently it seemed that we could not do anything to at least somehow approach the measurement of the meaningfulness of the text, but in recent years the situation began to change. And it is primarily due to the use of artificial neural networks to machine translation tasks, automatic text reference, the extraction of information from texts, generating reports in a natural language. In all these tasks, there is a conversion, encoding and decoding of meaningful information concluded in a natural language. And gradually make up an idea of \u200b\u200binformation losses in such transformations, and therefore - as meaningful information. But today, the clarity and accuracy that the Shannon theory of information has, there are no more difficulties in these difficult tasks.

Claude Elwood Shannon (1916-2001) -
american engineer and mathematician,
founder of information theory,
those. processing theory, transmission
and storing information

Claude Shannon The first began to interpret the transmitted messages and noises in the communication channels from the point of view of statistics, considering both finite and continuous sets of messages. Claude Shannon called "Father of the theory of information".

One of the most famous scientific works of Claud Shannon is his article "Mathematical Communication Theory", published in 1948.

In this work, Shannon, exploring the problem of rational information transmission through the roaring communication channel, proposed a probabilistic approach to understanding communications, created the first, truly mathematical, entropy theory as accidental measures and introduced a measure of discrete distribution p. The probabilities on the set of alternative states of the transmitter and the message receiver.

Shannon asked to measure entropy and brought the formula that became the basis of the quantitative theory of information:

H (P).

Here n. - the number of characters from which the message (alphabet) can be drawn up, H. - information binary entropy .

In practice, probabilities p I. The resulting formula is replaced by their statistical estimates: p I. - relative frequency i.- symbol in the message where N. - the number of all characters in the message, N I. - Absolute frequency i.-Ho symbol in the message, i.e. Number of occurrence i.-Ho symbol in the message.

In the introduction to its article, the "mathematical theory of communication" Shannon notes that in this article he expands the theory of communication, the main provisions of which are contained in important works Nyquista and Hartley.

Harry Nyquist (1889-1976) -
american engineer Swedish
origin, one of the pioneers
information theories

The first results of Nyquist to determine the width of the frequency range required to transmit information, laid the foundations for the follow-ups of Schannon Claud in the development of theory of information.

In 1928, Hartley introduced the logarithmic measure of information H. = K. · Log 2. N.which is often called the Hartliev number of information.

Hartley belongs to the following important theorem on the required amount of information: if in a given set M.consisting of out N. elements contained item x.which only knows what he belongs to this set M.then to find x., It is necessary to obtain about this set amount of information equal to log 2 N. bit.

By the way, we note that the name BIT It happened from the Bit Abbreviation Bit - Binary Digit.. This term was first proposed by American mathematician John Tyuki In 1946. Hartley and Shannon used bits as a unit of measurement of information.

In general, Shannon's entropy is the entropy of the most likely p. 1 , p. 2 ,…, p N..

Ralph Vinton Lyon Hartley (1888-1970)
- American Scientist-Electronics

Strictly speaking if X. p. 1 , p. 2 ,…, p N. - probabilities of all its possible values, then the function H. (X.) Specifies the entropy of this random variable, while though X. And it is not an argument entropy, you can record H. (X.).

Similarly, if Y. - finite discrete random value, and q. 1 , q. 2 ,…, q. M - probabilities of all of its possible values, then for this random variable you can record H. (Y.).

John Wilder Tyuki (1915-2000) -
American mathematician. Tyuki elected
Bit for designation of one discharge
in the binary number system

Shannon called the function H.(X.)entropy by council John von Neumana.

Neuman convinced: this feature should be called entropy "For two reasons. First of all, your uncertainty function was used in statistical mechanics under this name, so she already has a name. In second place, and more importantly, no one knows what entropy is actually, so in the discussion you will always have an advantage ".

It is necessary to assume that this Top Nimana was not a simple joke. Most likely, John von Neumann and Claude Shannon knew about the information interpretation of the Boltzmann entropy as a magnitude that characterizes the incompleteness of information about the system.

In the definition of Shannon entropy - This is the amount of information per elementary message source generating statistically independent messages.

7. Entropy Kolmogorov

Andrey Nikolaevich
Kolmogorov (1903-1987) -
soviet scientist, one of the largest
mathematicians of the XX century

A.N. Kolmogorov Fundamental results were obtained in many areas of mathematics, including in the theory of complexity of algorithms and theory of information.

In particular, it owns a key role in the transformation of the theory of information formulated by Claude Shannon as technical discipline, in strict mathematical science, and in building the theory of information on a fundamentally different other than Shannon, based.

In their work on the theory of information and in the field of the theory of dynamic systems A.N. Kolmogorov summarized the concept of entropy into ergodic random processes through the maximum distribution of probabilities. To understand the meaning of this generalization, it is necessary to know the basic definitions and concepts of the theory of random processes.

The value of the Entropy of Kolmogorov (still called K-entropy) Sets an estimate of the speed of information loss and can be interpreted as a measure of "memory" of the system, or a measure of the "forgetting" speed of the initial conditions. It can also be considered as a measure of the chaoticness of the system.

8. Entropy Reni

Alfred Reni (1921-1970) -
Hungarian mathematician, creator
Mathematical Institute in Budapest,
now wearing his name

Introduced a single-parameter spectrum of the Entropy Reni.

On the one hand, the Entropy of Reni is a generalization of Shannon's entropy. On the other hand, at the same time, it is a generalization of the distance (discrepancies) Kulbak labell. We also note that it is Reni that owns the full proof of the Hartley theorem on the necessary number of information.

Distance Kulbaka Labeler (Information divergence, relative entropy) is an asymmetrical measure of remoteness from each other of two probabilistic distributions..

Typically, one of the compaable distributions is a "true" distribution, and the second distribution is an estimated (verifiable) distribution that is the approach of the first one.

Let be X., Y. - These are final discrete random variables for which areas of possible values \u200b\u200bbelong to a given set and probability functions are known: P. (X. = a I.) = p I. and P. (Y. = a I.) = q I..

Then the DKL Distance Distance Distance Distance is calculated by formulas

D KL (X., Y.) =, D KL (Y., X.) = .

In the case of absolutely continuous random variables X., Y.defined by their density distribution, in formulas for calculating the distance of the distance of the Thunder Label, the amount are replaced by the corresponding integrals.

The distance of the Kulbak labell is always a non-negative number, while it is zero D KL(X., Y.) \u003d 0 if and only when the equality is right for the specified random variables X. = Y..

In 1960, Alfred Reni offers his generalization of entropy.

Entropy Reni Represents a family of functionals for the quantitative diversity of the randomness of the system. Reni identified his entropy as a moment of order α ε-splitting measures (coating).

Let α be a given valid number that satisfies the requirements α ≥ 0, α ≠ 1. Then the entropy of the degree of order α is determined by the formula H. α = H. α ( X.)where p I. = P. (X. = x I.) - the probability of an event consisting in the fact that the discrete random value X. It turns out to be equal to its relevant possible value, n. - Total number of different possible random values X..

For uniform distribution when p. 1 = p. 2 =…= p N. =1/n.all ray entropy are equal H. α ( X.) \u003d ln. n..

Otherwise, the radius entropy values \u200b\u200bare weakly reduced by increasing the values \u200b\u200bof the parameter α. Entropy Reni play an important role in ecology and statistics as diversity indices.

Entropy Reni is also important in quantum information, it can be used as a measure of complexity.

Consider some particular cases of rain entropy for specific values \u200b\u200bof order α:

1. Entropy Hartley : H. 0 = H. 0 (X.) \u003d ln. n.where n. - Power area of \u200b\u200bpossible values \u200b\u200bof the final random variable X.. the number of different elements belonging to the set of possible values;

2. Information entropy Shannon : H. 1 = H. 1 (X.) = H. 1 (p.) (It is defined as the limit for α → 1, which is easy to find, for example, using the Lopital rule);

3. Correlation entropy. or collision entropy: H. 2 = H. 2 (X.) \u003d - Ln ( X. = Y.);

4. Min-entropy. : H. ∞ = H. ∞ (X.).

Note that for any non-negative value of order (α ≥ 0) inequalities are always performed H. ∞ (X.) ≤ H. α ( X.). Moreover, H. 2 (X.) ≤ H. 1 (X.) I. H. ∞ (X.) ≤ H. 2 (X.) ≤ 2 · H. ∞ (X.).

Alfred Reni introduced not only its absolute entropy (1.15), he also identified the spectrum of discrepancies, generalizing the divergence of the Kulbak Laber.

Let α be a given valid number satisfying the requirements of α\u003e 0, α ≠ 1. Then in the notation used in determining the value D KL The distances of the Kulbak labeler, the value of the increasing of the order α is determined by formulas

D. α ( X., Y.), D. α ( X., Y.).

The discrepancy of Reni is also called alpha.development or α-divergence. The Reni himself used the logarithm on the base 2, but, as always, the value of the base of the logarithm is absolutely no matter.

9. Entropy Tsullis

Konstantino Tsullis (born 1943) -
Brazilian physicist
greek origin

In 1988, he proposed a new generalization of entropy, which is convenient for use in order to develop the theory of nonlinear thermodynamics.

The generalization of entropy proposed by them, perhaps, in the near future will be able to play a significant role in theoretical physics and astrophysics.

Entropy Tsullis SQ., often called non-astensive (inadequate) entropy, is determined for n. Microstastes according to the following formula:

S Q. = S Q. (X.) = S Q. (p.) = K.· , .

Here K. - Dimensional constant, if the dimension plays an important role to understand the task.

Tsullis and his supporters offer to develop "non-systinistic statistical mechanics and thermodynamics" as a generalization of these classical disciplines in case of systems with long memory and / or long-range forces.

From all other varieties of entropy, incl. And from the entropy of Reni, Tsullis's entropy differs in that it is not additive. This is a fundamental and important difference..

Tsullis and his supporters believe that this feature makes it possible to build new thermodynamics and a new statistical theory that ways simply and correctly describe systems with long memory and systems in which each element interacts not only with the nearest neighbors, but also with the entire system as a whole or its large parts.

An example of such systems, and therefore, the possible object of studies with the help of a new theory is cosmic gravitating systems: star clusters, nebula, galaxies, accumulations of galaxies, etc.

Since 1988, when Konstantino Tsullis suggested his entropy, there was a significant number of thermodynamics applications of abnormal systems (with a length of memory and / or with long-range forces), including in the field of thermodynamics of gravitating systems.

10. Quantum entropy background Neymanan

John (Janos) von Neuman (1903-1957) -
American mathematician and physicist
hungarian descent

The entropy von Neymanana plays an important role in quantum physics and astrophysical studies.

John von Neuman There is a significant contribution to the development of such industries of science as quantum physics, quantum logic, functional analysis, the theory of sets, computer science and economics.

He was a member of the Manhattan project to develop nuclear weapons, one of the creators of the mathematical theory of games and the concept of cellular automata, as well as the founder of the modern architecture of computers.

The entropy von Neymanan, as any entropy, is associated with information: in this case, with information about the quantum system. And in this regard, it plays the role of a fundamental parameter, quantitatively characterizing the state and direction of the evolution of the quantum system.

Currently, the entropy von Neumanan is widely used in various forms (conditional entropy, relative entropy, etc.) within the framework of the quantum theory of information.

Various confusion measures are directly related to the entropy von Neymanan. Nevertheless, recently there have been a number of works devoted to the criticism of Shannon's entropy as a measure of information and its possible inadequacy, and, consequently, the inadequacy of entropy von Neuman as a generalization of Shannon's entropy.

A review (unfortunately, a quick, and sometimes insufficiently strictly strict) evolution of scientific views on the concept of entropy makes answers to important issues related to the true essence of entropy and the prospects for the use of the entropy approach in scientific and practical studies. We confine ourselves to considering answers to two such questions.

First question: Are there numerous varieties of entropy, as considered and not considered above, something is common in addition to the same name?

This question arises naturally, if we take into account the diversity, which characterizes the existing various ideas about entropy.

At present, the scientific community did not develop a single, recognized by all, an answer to this question: some scientists respond to this question affirmatively, others - negatively, others belong to the community of entropy of various species with a noticeable share of doubt ...

Clausius, apparently, was the first scientist convinced of the universal nature of entropy and believed that in all the processes occurring in the universe, it plays an important role, in particular, determining their development direction in time.

By the way, it is Rudolph Clausius belongs to one of the wording of the second start of thermodynamics: "The process is impossible, the only result of which would be heat transfer from a colder body to more hot".

This formulation of the second start of thermodynamics is called the postulate of Clausius , and an irreversible process, which is spent in this postulate - clausius process .

Since the opening of the second start of thermodynamics, irreversible processes played a unique role in the physical picture of the world. So, the famous article 1849 William Thompsonwhich presents one of the first formulations of the second start of thermodynamics, was called "about a universal trend in nature to the mechanical energy dissipation."

We also note that Clausius was forced to use cosmological language: "The entropy of the Universe is committed to the maximum".

Ilya Romanovich Prigogin (1917-2003) -
Belgian-American physicist and
chemist of Russian origin,
Nobel Prize Laureate
in 1977 chemistry

To similar conclusions came Ilya Prigogin. Prigogin believes that the principle of entropy is responsible for the irreversibility of time in the universe and, possibly, plays an important role in understanding the meaning of time as a physical phenomenon.

To date, many studies and generalizations of entropy, including from the point of view of strict mathematical theory. However, the noticeable activity of mathematicians in this area is not yet in demand in applications, with the exception, perhaps, works Kolmogorov, Reni and Tsullis.

Undoubtedly, entropy is always a measure (degree) of chaos, disorder. It is precisely a variety of manifestations of phenomenon chaoticness and disorder causes the inevitability of the diversity of entropy modifications.

Second question: Is it possible to recognize the scope of applying an entropy approach of extensive or all entropy applications and the second start of thermodynamics limited to the thermodynamics and adjacent directions of physical science?

The history of scientific study of entropy shows that entropy is a scientific phenomenon, open in thermodynamics, and then successfully moved to other sciences and, above all, in the theory of information.

Undoubtedly, entropy plays an important role in almost all areas of modern natural science: in thermal physics, in statistical physics, physical and chemical kinetics, in biophysics, astrophysics, cosmology and information theory.

Speaking about applied mathematics, it is impossible not to mention the application of the principle of the maximum entropy.

As already noted, the entertainment areas are quantum-mechanical and relativistic objects. In quantum physics and astrophysics, such applying entropy is great interest.

We mention only one original result of thermodynamics of black holes: the entropy of a black hole is equal to a quarter of its surface (skyline area of \u200b\u200bevents).

In cosmology it is believed that the entropy of the Universe is equal to the number of quanta of relict radiation per nucleon.

Thus, the scope of applying the entropy approach is very extensive and includes a wide variety of branches of knowledge, starting with thermodynamics, other directions of physical science, computer science and ending, for example, history and economics.

A.V. Cigal , Doctor of Economics, Krymsky University named after V.I. Vernadsky

Information and entropy

Discussing the concept of information, it is impossible not to affect another adjacent concept - entropy. For the first time, the concepts of entropy and the information tied the K. Shennon.

Claude Elwood Shannon ( Claude Ellood Shannon.), 1916-2001 - Far relative of Thomas Edison, an American engineer and mathematician, was an employee of Bell Laboratories from 1941 to 1972 in his work "Mathematics theory of communication" (http://cm.bell-labs.com/cm/ms / What / Shannonday /), published in 1948, first determined the measure of information content of any communication and the concept of quantum information - bit. These ideas formed the basis of the theory of modern digital communication. Another work of Shennon "Communication Theory of Secrecy Systems", published in 1949, contributed to the transformation of cryptography into scientific discipline. It is the founder information theorieswho has been used in modern high-tech communication systems. Shannon made a huge contribution to the theory of probabilistic schemes, the theory of automata and the theory of management systems - the sciences, united by the concept of "cybernetics".

Physical definition of entropy

For the first time, the concept of entropy introduced Clausius in 1865 as the function of the thermodynamic state of the system

where Q is heat, T - temperature.

The physical meaning of entropy is manifested as part of the internal energy of the system that cannot be turned into work. Clausius empirically received this feature, experimenting with gases.

L. Boltsman (1872) by methods of statistical physics brought the theoretical expression of entropy

where k is a constant; W is the thermodynamic probability (the number of permutations of the molecules of the ideal gas that does not affect the system of system).

The Boltzmann entropy is derived for the perfect gas and is interpreted as a measure of a mess, a measure of chaos system. For the perfect gas entropy of the Boltzmann and Clausius are identical. The Boltzmann formula became so famous that he was drawn as an epitaph on his grave. There was an opinion that entropy and chaos have the same thing. Despite the fact that entropy describes only the perfect gases, it was not critical to attract more complex objects to describe.

Bolzman himself in 1886. I tried with the help of entropy to explain what life is. According to Boltzmann, life is a phenomenon that can reduce its entropy. According to Boltzmann and his followers, all processes in the universe are changed in the direction of chaos. The universe goes to the thermal death. This gloomy forecast has long dominated in science. However, the deepening of knowledge of the world around the world gradually loosened this dogma.

Classics did not associate entropy with information.

Entropy as a measure of information

Note that the concept of "information" is often interpreted as "information", and the transmission of information is carried out by communication. K. Shannon considered entropy as a measure of useful information in the transmission processes on wires.

To calculate the entropy, Shannon proposed an equation resembling a classic entropy expression found by Boltzmann. An independent random event is considered. x. With n possible states and p I-convertibility of the I-th state. Then entropy events x.

This value is also called medium entropy. For example, we can talk about the transfer of the message in a natural language. When transferring different letters, we transmit a different amount of information. The amount of information on the letter is associated with the frequency of use of this letter in all messages generated in the language. The more rare letter we pass, the more information in it.

Value

H i \u003d p i log 2 1 / p i \u003d -p i log 2 p i

it is called private entropy, characterizing only I-E state.

Let us explain on the examples. When throwing a coin, an eagle or a rush falls out, this is a certain information about the results of the throwing.

For coins, the number of equivalent capabilities N \u003d 2. The probability of an eagle falling (rush) is 1/2.

When throwing the bone, we obtain information about the fallout of a certain amount of points (for example, three). In which case do we get more information?

For the bone, the number of equivalent capabilities N \u003d 6. The probability of falling out three bone points is 1/6. Entropy is 2.58. Implementation of a less likely event gives more information. The more uncertainty before receiving a message about the event (throwing coins, bones), the more information comes when receiving a message.

This approach to the quantitative expression of information is far from universal, since the adopted units do not take into account the important properties of information as its value and meaning. Abstraction from specific properties of information (meaning, value of it) about real objects, as in the future it turned out, made it possible to identify the general patterns of information. Santed by Shannon to measure the number of information units (bits) are suitable for evaluating any messages (Birth of the Son, the results of the sports match, etc.). In the future, attempts were made to find such measures of the number of information that would take into account its value and meaning. However, universality was immediately lost: the criteria of value and meaning are different for different processes. In addition, the determination of the meaning and value of the information is subjective, and the information proposed by Shannon is objective. For example, the smell carries a huge amount of information for the animal, but no longer for a person. The person's ear does not perceive ultrasound signals, but they carry a lot of information for dolphin, etc. Therefore, the information proposed by Shannon is suitable for the study of all types of information processes, regardless of the "tastes" of the consumer of information.

Measuring information

From the course of physics, you know that before measuring the value of any physical value, you must enter the unit of measurement. The information also has such a unit - bit, but it is meaningful at different approaches to the definition of the concept of "information".

There are several different approaches to the problem of measuring information.