Beginner's to Intermediate Stat-o-Sphere
Follow this link directly to part V of this series, if you wish, or have a look at other tutorials of our collection Stat-o-Sphere.
Review: This is a pre-released article, and we are currently looking for reviewers. Contact us via [email protected]. You can also directly send us your review, or use our open peer-review functions via pubpub (create an account here). In order to comment mark some text and click on “Start discussion”, or just comment at the end of an article.
Pub preview image generated by via Leonardo AI.
Welcome to the fourth part of this series on information theory! With all the previous parts of this series, we hope you have gained a stable and deep insight into the mathematics and the intuition behind information theory and information technology.
Note that you have again entered the beginner’s to intermediate Stat-o-Sphere. This tutorial will also only focus on more general / conceptual discussions on information theory — so there will be no real discussions on the actual mathematics in this tutorial. This tutorial also works as an editorial contribution to our Section I, Medical Humanities, though we recommend to have at least read the first part of this series to be able to fully follow the background discussed in this part of the series.
In general, the following tutorial is kept rather simple, since it concerns very basic experiences of communication we all have. I hope you have gained the impression so far that information theory is a rather straight forward concept that has a pragmatic look on communication. The following won’t be any different. Actually, I hope the discourse wont appear too simple in some parts. However, the intention of this tutorial was to not jump too far into anything else than Warren Weavers essay “Recent contributions on the mathematical theory of communication”, published in 1949 together with Shannon’s paper AMTC.
So in nuce, before closely moving towards more advanced concepts of information theory, such as approximate Bayesian methods (e.g., variational inference), we will first have a look at how information theory was received as a theory of communication after being defined in 1948 by science in general, as well as humanities.
Looking into the literature concerning the linguistic / philosophical, the psychological, biological / physical and sociological reception of information theory in the 20th century, one will find very mixed results, especially from the linguistic / philosophy departments. The reasons probably range from mere misunderstanding to gate-keeping strategies.
However, we have discussed a lot of potential traps when trying to understand information theory along the way to this point of this series. In the first chapter we will briefly reconsider some common problems of understanding information theory on the conceptual and mathematical level.
One of the biggest problems of understanding information is that it presents itself as a probabilistic quantity. This is, e.g., present in Markov chains that we have gone through in the last part of this series, but also in simple facts such as: the less uncertain we are about an event (e.g. a sign that was received), the less information in the common sense is needed to infer on an event (also think of Vanessa Holmes again!).
When understanding ourselves as a receiver of information around us, we can think of information this way: If nothing around us changes, information content per second will be low; if everything around us changes, the opposite happens, which in other words: ─ leaves us with a lot and often too many questions, i.e., too much information to process. The same can be said with Shannon’s sender-receiver model: a single sign having a probability of
Understanding the relation between the physical concept of entropy also turned out to be a little tricky sometimes. One reason is that entropy is often descibed as “disorder”, which may not be as intuitive as it may seem. The problem is that the concept of entropy does not necessarily follow our aesthetic understanding of disorder — the opposite turned out to be the case:
Another example is a the dynamics within houses/flats: You just cleaned your room, so you used energy to reduce the disorder of the room. Now all t-shirts aren’t laying around in the room anymore, they are all in one place — similar to the ink that has just fallen into the water. Over time, the t-shirts are eventually laying around in the room again (time-arrow of entroy).
Apart from that we also discussed that expressions such as “a computer is just 0 and 1” when referring to what a computer can do or not do, are completly missing the point, e.g., since the base is arbitrary. Since one possible sign is not enough in order to en- and decode anything (leads to 0 information), bits is the simplest way to represent and transfer information, since they can be represented well via relay switches or todays transistors. In physics the logarithm to the base of
In general, the term computation however just means: solving a problem / answering a question via an algorithm — any algorithm, not a specific one in a specific hardware representation or so... This issue of understanding the concept of information, information processing and computation is, e.g., discussed in this paper by Richards and Lillycrap in 2022. So again, “computation is just 0 and 1” or “input and output” is neither witty, nor does it add anything to the story of what computers have to do with humans and what not, apart from again starting an old discourse on a lack of information about the concept of information and information technology itself… (from the first part of this series)
Another, and maybe for some of you much more important problem of understanding information theory that we have encountered and partially discussed so far is the fact that the concept of communication does not involve the conveyance of meaning. Something, which appears to be a problem for a lot of people — maybe due to a kind of existentialist impression this makes on language. In fact, the expression “conveying meaning” can only be understood as a metaphor in relation to information theory, since meaning is never actually conveyed in the sense of being within the world as an independent entity, moving from one point to another (reification of meaning). Information theory represents more an inferential relationalism. Meaning as the result of a complex inferential relation to a message is happening “within” the receiver that is related to the world and within the world.
The above circumstance tells us that how we interpret words may not be what was intended, true, plausible etc. Put simple the above confronts us with uncertainty within communication as such.
If meaning is not conveyed, why can we understand each other? One answer (and to me the monst sensible) is by saying: Well, since we are humans, we essentially share the same phenotype, so there is a lot we can relate to with others already, just by self-reference (inferring on what we expect we would do) and making the same and similar experiences in the world. We mentioned before that there is, e.g., work within active inference / predictive processing / dynamical systems theory research that answers the above question roughly this way (Ramstead, Friston, Hipólito 2020). The fact that the phenotype plays a role is of course also a classic biological perspective on what makes communication likely.
In any way, Shannon was only…
[…] concerned with the first level of communication, the transmission of signs, not about meaning. However, information theory should not be considered faulty in that resepect, Shannon actually made a point with this: We can communicate via language, despite transmitting meaning. Meaning is something we infer internally and individually — not randomly, but within our contingencies and based on experiences and ourselves as such (e.g., ourselves as human (our phenotype)). However, what we can influence or better, what gives us the intention / the drive to speak is to evoke a meaning in other people via first level communication. (from the third part of this series)
The fact that meaning is never literally conveyed is something that is mostly completely overlooked in a vast number of receptions of Shannon’s work outside of computer science and physics, even until now. Relevance theory is a very good example. Sperber/Wilsom claim that both Shannon’s and Weaver’s argue that meaning is inferred and doesn’t rely on coding for itself, completly overlooking the stochastic and inferential nature of information theory (compare the content of this series with Sperber / Wilson 1986, p. 4f.).
Since it can be hard to get a stable understanding of information theory, a book was published under the name “THE mathematically theory of communication” in 1949 (TMTC), which entailed an independent expository introduction into information theory by Warren Weaver, together with the paper of Shannon, AMTC (“A mathematical….”). Weaver was also a mathematician and scientist, also known for his pioneering theoretical work on machine translation (especially his memorandum “Translation” from 1949). Different to the impression that the reception of Weaver’s essay makes, it is far from being difficult to understand and also has a rather clear perspective on how more complex dimensions of communication relate to information theory.
You can read and download the “THE mathematical theory of communication” (TMTC) here. We will also widely discuss the most important parts in the next chapter.
The title of the expository introduction by Warren Weaver within “THE math…” is called “Recent contributions to the mathematical theory of communication”. It is not too long, simply written and involves only few of the math that we discussed in this series. In general it represents an introduction / short tutorial on information theory, as well as an essay on its implications for understanding communication in the wider sense. Note that in general the text of Weaver does not really appear 1940ish most of the time and is really easy to read. One exception is a short passage (chapter 3.2) that I found where Weaver instantly compares noisless communication with a female secretary accepting and passing on a telegram note, which of course is a symptom of a deeply patriachical society at that time.
We still deeply recommend you reading Weaver’s introduction, as you will see that you know most of it already by now ─ in fact you will be surprised how quick you will rush through it, given all that we have gone through so far. Weaver’s essay will also give you some more insights into some of the content of Shannon’s paper that I haven’t included in this work.
Below we will mostly go through those parts of Weaver’s paper that discuss the topic “meaning”. Weaver’s work starts quite promptly with a very interesting concept discussing three levels of problems of communication that can be cast as the levels of the complexity of inference (also note that the following entails a recursive structure, as A below is the basis of B, and A and B is the basis of C):
LEVEL A. How accurately can the symbols of communication be transmitted? (The technical problem.)
LEVEL B. How precisely do the transmitted symbols convey the desired meaning? (The semantic problem.)
LEVEL C. How effectively does the received meaning affect conduct in the desired way? (The effectiveness problem.) (TMTC, p. 9.)
Note again that neither Weaver, nor Shannon present a mathematical theory of “meaning” or semantics, so apart from level A nothing up ahead is essentially part of the original mathematical information theory discussed by Shannon, but a speculation about future problems and possibilities in the field of information theory by Weaver ─ a kind of essay and tutorial (such as this series). Nevertheless, we will see that Weavers predictions of the future turned out to be well chosen from nowadays perspective (looking at the math behind a lot of todays AI concepts, and most of all those in computational neuroscience).
The triadic structure of the three levels of communication above is of course no coincidence and again relates well to semiotics (Peirce), as we know from previous parts of this series. We will get to the relation of the triadic structure of abductive and Bayes’ inference, semiotics and Weavers three level concept later, but only briefly (we will maybe get further into this realtion some other time).
The first level is often denoted to be the engineering problem of communication, but both ourselves as well as Warren Weaver, are well informed already, knowing that this level of communication is the essence of all the other parts ─ nevertheless how the other levels work and could be represented mathematically (again, which they are not in Weaver’s essay). Weaver still argues (chaoter 3.2, see further below) that only slight adjusments would be needed in order to represent all three levels — which, e.g., means that the stochastic character remains. All of the levels can therefore be looked at as equivalent concerning the basic form of inference (abduction). It is more the question of accuracy and hierarchical depth that makes a distinction of “problems of communication” reasonable.
The first level, as we know, most of all aims for lossless / noiseless transmission and en- and decoding of signs, but is still essentially the result of a probabilistic inference, as we have mathematically seen and proven before in the first part of this series.
We also discussed the term code as such (encoding/decoding), which can also be understood as following a principle of simple equivalence: something equals something else and can also be looked at from a perspective of Bayes’ rule. The code equivalence on the first level is therefore importantly nothing happen separate from the mathematics, as is it can be understood as a pre-set or lossless bi-directionality, when inferring from A to ASCII-Code and vice versa, considering them as two events in time, or a relation between
A receiver and a source can be thought of as performing the latter inference (de- or encoding) internally, without having direct insight into the receiver’s process or that of the source ─ depending on the directional role within the process of communication. In general, a code can be seen as a self-explainatory a priori agreement between sender and receiver, since otherwise we would need to decrypt a message.
However, in some ways we could argue that knowing and decoding or encoding a code is also something that requires meaning. This is true, since we often also express it this way: “A means ‘some ASCII-Code’” (for the computer or us that know the binaries). This is definitely something we do internally, but it still does not leave the first level of communication. Nevertheless, we can argue that this can be looked at as a special case of the second level of communication (special in the sense that a sign “only means” another sign).
Let us jump to the respective level B, to deepen our considerations on “meaning”:
The second level is a little difficult to tell apart from the third level, as we will see, but in general this level is…
[…] concerned with the identity, or satisfactorily close approximation, in the interpretation of meaning by the receiver, as compared with the intended meaning of the sender. (TMTC, p. 9).
Close approximation is easy to explain from what we have discussed before and since it is something we all do every day. However it gets a little problematic to think of an identity of a meaning, since it appears contradictory and it is hard to understand what this means, as we will see. Weaver also quickly argues against it (p. 11), however, it is important to think of why this is the case (several reasons, which we will discuss peu-a-peu below).
[…] if Mr. X is suspected not to understand what Mr. Y says, then it is theoretically not possible, by having Mr. Y do nothing but talk further with Mr. X, completely to clarify this situation in any finite time. If Mr. Y says “Do you now understand me?” and Mr. X says “Certainly, I do,” this is not necessarily a certification that understanding has been achieved. It may just be that Mr. X did not understand the question. If this sounds silly, try it again as “Czy pa mnie rozumie?” with the answer “Hai wakkate imasu.”. I think that this basic difficulty [of approximating the meaning of a message] is, at least in the restricted field of speech communication, reduced to a tolerable size (but never completely eliminated) by "explanations" which (a) are presumably never more than approximations to the ideas being explained, but which (b) are understandable since they are phrased in language which has previously been made reasonably clear by operational means. For example, it does not take long to make the symbol for "yes" in any language operationally understandable.(TMTC, p. 10f.)
First of all, note that the second level may not be what we expect of the concept of meaning in the everyday sense, which probably can also include the third level of communication discussed below. In other words, semantic meaning can also refer to mere lexical meaning / simple facts, e.g., with a focus on mere perception or a generalization of meaning. This can even be simplified down to a level of signs itself again, as mentioned previously (“A ‘means’ ASCII-Code”, or a sign related to a sound (a memorized and conditioned experience as meaning)).
The first and second level of communication is also similar to what Ferdinand de Saussure, a Swiss philosopher discussed with his structuralist conceptions, which you may have encountered before (a dyadic semiotic concept of semantics):
The third level of communication (triadic) will include more individual aspects of what meaning can mean, also in the sense of arts and literature or our a multisensory first person perspective that is related to, e.g., words. It will especially entail what we may refer to as intention or volition, involving ourselves in general as part of the process of inferring on meaning and generating messages with a certain intention/expectation behind it (e.g., evoking emotions).
It may seem at first glance undesirably narrow to imply that the purpose of all communication is to influence the conduct of the receiver. But with any reasonably broad definition of conduct, it is clear that communication either affects conduct or is without any discernible and probable effect at all. (TMTC, p. 11)
However, note that similar to a situation of 0 information due to full certainty about the message on the first level of communication (signs), a certainty about the (intended) meaning of a message in the general sense (the second semantic and the third pragmatic level of communication) is nothing that we can actually obtain or even actually pursue in a literal sense. A very general answer would again be: We wouldn’t need to communicate if such a state was possible or necessary — we again would already know any meaning in advance, we wouldn’t make a difference anymore, we would lose our self in such a state, so to speak.
Also note that the above problem of “literal equivalence of meaning” on both sides, sender and receiver, is also not the same as experiencing something together and having similar thoughts, feelings, or drawing similar conculsions (a relation is not the same as a literal equivalence of the meaning on both sides).
Epistemic problems concerning the literal / referential existence of the above “eqivalence of meaning” can be found, e.g., on the mere level of perspective: Even though one makes experiences together with others that we can relate to in general (hitting ones leg), everyone makes their own experience, is their own organism, their own set of contigencies, lives in his own context, so to speak — even though all humans share the same phenotype (human).
Simply put: If I would argue to know exactly what another person is thinking or going through from what a person said, I would have to be that other person and would not need to ask or convey meaning in the first place, but could also not further reflect it, since I would then change the information — putting the “equivalence” in a state of jeopardy. Equivalence in that literal sense becomes a senseless iteration, a reified redundancy that melts up the self and actually loses the characteristics of being “meaningful” and becomes degenerate on higher levels (such as the semantic level of communication moving down to the first level of sign equivalence again). So we can argue that the second level does not involve an equivalence of meaning, but more of a contiguity (a relation).
So in nuce: Structurally the second level is equivalent to the first level of communication problems in the sense that it involves the same problem of “meaning” being ‘something’ that is a possibility amongst a contingent range of possible “meanings”. The fact that meaning in any degree of complexity is a part of a literal optimization problem within communication also shows that meaning cannot be something outside of us and is nothing stable, i.e., is constantly updated. It is also affected by noise, just on different levels — and not only due to noise on the channel of first level of communication, but by drawing false relations (mishearing or misspelling a word; which can even be done by intention…). We also do not create a seperate sentence with a special meaning for every individual situation in life, so to speak. We reuse words in different contexts that are similar enough — or we reflect and expand, further differentiate in order to gain more detail in our expressions.
In any way: level two leaves us with the question on how to verify such an approximation being “close”, “relevant”, “useful”, “effective” etc. (without essentially only performing more communication under the same limits). In other words: the second level usually involves dynamical processes — how to handle those? This leaves us with level three of communication.
The third level states that the:
[…] effectiveness problems are concerned with the success with which the meaning conveyed to the receiver leads to the desired conduct on his part. It may seem at first glance undesirably narrow to imply that the purpose of all communication is to influence the conduct of the receiver. But with any reasonably broad definition of conduct, it is clear that communication either affects conduct or is without any discernible and probable effect at all. (TMTC, p. 10).
Here is again Weavers illustrative example of how the semantic problem may represent itself within communication (second level) and implicitly how it necessarily also involves level three, checking on the effectiveness, as well as level one, checking on the possibilities of communication:
[…] if Mr. X is suspected not to understand what Mr. Y says, then it is theoretically not possible, by having Mr. Y do nothing but talk further with Mr. X, completely to clarify this situation in any finite time. If Mr. Y says “Do you now understand me?” and Mr. X says “Certainly, I do,” this is not necessarily a certification that understanding has been achieved. It may just be that Mr. X did not understand the question. If this sounds silly, try it again as “Czy pa mnie rozumie?” with the answer “Hai wakkate imasu.” (TMTC, p. 10).
When applying the third level of communication on the above example by Weaver, aspects such as trust, hope, confidence, elements of self-identification and meta-reflection etc. can play a role in how such a conversion will procede. They could start fighting over an misunderstanding, or start to understand each other better and better by means of at least following the desire to be understood. In the above Monty Python example the desire of the sender of information concerns trying to buy something from a shop.
The effectiveness problem is closely interrelated with the semantic problem, and overlaps it in a rather vague way; and there is in fact overlap between all of the suggested categories of problems. (TMTC, p.11f.)
The distinction may become easier when thinking of the third level of communication as the level of the poetics that desires to produce a certain aesthetic, a certain perception, or certain inferred meanings and, e.g., affection on the receivers side and so on. In the case…
[…] of speech, written or oral, it involves considerations which range all the way from the mere mechanics of style, through all the psychological and emotional aspects of propaganda theory, to those value judgments which are necessary to give useful meaning to the words "success" and "desired" in the opening sentence of this section on effectiveness. (TMTC, p.11)
In a more general sense: analogies, metaphors, metonymies, synekdoches — the whole range of rhetoric figurations that follow basic principles such as contiguity (relationality) and equivalence/similarity. This is something I always test every new generative AI: does it understand certain relational and semiotic effects. In fact, there is no such model so far and there are several structural reasons why so as we will see.
The problem of distinguishing the second and the third level of communication problems is probably most of all rooted in the every day concept of “meaning”, which does not itself distinguish well between those two levels. We still hope that you got a clear understanding what these three levels refer to and how hierarchical and heterarchical dependecies can be found between them.
We mentioned before several significant aspects about the first level information theory and how it is the basis for all three levels interacting with each other. We discussed the fact that communication implies a minimal difference, which relates to the fact that full certainty contains no information (0 bits; so it demands minimal uncertainty/disorder…). Another aspect is the fact that we communicate via the first level of communication without actually conveying meaning. This is essentially what makes individual meaning possible in the first place and can be looked as an indirectly required minimal difference on the second level! We therefore learned that inference on meaning is something not involved in communication in terms of transmitting signs (which just requires a first level of communcation), it happens on each side, sender and receiver respectively and individually — apart from aspects such as sharing the phenotype (human), speaking the same language etc., which makes the possibility of communication on such level at least more likely (in other words: upfront reduces questions that needed to be asked in order to successfully communicate).
The mathematical theory of the engineering aspects of communication, as developed chiefly by Claude Shannon at the Bell Telephone Laboratories, admittedly applies in the first instance only to problem A, namely, the technical problem of accuracy of transference of various types of signals from sender to receiver. But the theory has, I think, a deep significance which proves that the preceding paragraph is seriously inaccurate. Part of the significance of the new theory comes from the fact that levels B and C, above, can make use only of those signal accuracies which turn out to be possible when analyzed at Level A. Thus any limitations discovered in the theory at Level A necessarily apply to levels Band C. But a larger part of the significance comes from the fact that the analysis at Level A discloses that this level overlaps the other levels more than one could possible naively suspect. Thus the theory of Level A is, at least to a significant degree, also a theory of levels Band C. I hope that the succeeding parts of this memorandum will illuminate and justify these last remarks. (TMTC, p.12)
The next quote may be the reason that caused a lot of misunderstanding of information theory in the field of humanities, even though it is importantly very short and within the last chapter (relevance theory is again a very good example — compare the content of this series with Sperber / Wilson 1986, p. 4f.)).
It is almost certainly true that a consideration of communication on levels B and C will require additions to the schematic diagram on page 7, but it seems equally likely that what is required are minor additions, and no real revisions. Thus, when one moves to levels B and C, it may prove to be essential to take account of the statistical characteristics of the destination. One can imagine, as an addition to the diagram, another box labeled “Semantic Receiver” interposed between the engineering receiver (which changes signals to messages) and the destination. This semantic receiver subjects the message to a second decoding, the demand on this one being that it must match the statistical semantic characteristics of the message to the statistical semantic capacities of the totality of receivers, or of that subset of receivers which constitute the audience one wishes to affect. (TMTC, p.32)
Below you will find a figure that reconstructs the modification that Weaver speculated upon Shannon’s sender-receiver model. The semantic noise and a semantic receiver is added to the model.
RECALL, even though Weaver speaks of a semantic receiver, we know from the rest of the essay and the concept in general that such inference is still something that is performed on each side individually. The reason is again that the first level of transmission (middle part) does not involve meaning at all — it does not need it in order to make communication possible. However, this small passage may have sparked a lot of misunderstanding concerning the sender-receiver model in other non-mathematical fields.
We can recommend the paper by Seising et al., also concerning the history of information theory. It is not too difficult and should extend the knowledge of this series well. However, Seising et al. also highlight the fact that Weaver himself…
“[…] underscored very clearly the fact that Shannon’s theory did not even touch upon any of the problems contained in levels B and C, that the concept of information therefore must not be identified with the ‘‘meaning” of the symbols.” (Seising et al.).
The authors also assume a relation to semiotics concerning the formulation of the three levels of general communication, arguing Weaver’s ideas to be clearly related to the work of C.W. Morris. Roughly put, Morris argues that language / communication is a relation between syntax, semantics and pragmatics — which conceptionally follows the three levels of communication problems well (as well as a general triadic semiotic hierarchical scheme of course).
However, the third level is not represented in this extended model of communication. We can still guess from what we know that the third level will be all about weighting in on how well semantics are received and on how well they were transmitted (spoken words).
Weaver closes his speculation with the following important words:
The idea of utilizing the powerful body of theory concerning Markoff processes seems particularly promising for semantic studies, since this theory is specifically adapted to handle one of the most significant but difficult aspects of meaning, namely the influence of context. One has the vague feeling that information and meaning may prove to be something like a pair of canonically conjugate variables in quantum theory, they being subject to some joint restriction that condemns a person to the sacrifice of the one as he insists on having much of the other. (TMTC, p. 28)
This is actually exactly where we are mathematically heading to in further parts of the series. Especially active inference / the free energy principle is a concept that can be understood as a very direct extension of classic information theory (and ML), trying to represent all of the above levels of communication via mathematics in the way Weaver has anticipated it (and beyond, since it can be applied to any kind of system or “thing”, since it relies on a mathematical physics principle, e.g., see this interview with Maxwell Ramstead).
Since we are living in 2023, we can say that a lot of the prognoses from mathematics actually became true: it is reasonable to look at language as the result of a stochastic process. Given the results of, e.g., ChatGPT 4 (Aug. 2023) and keeping in mind that such a model is still arguably structured rather simple, reading the results appears often very impressive and even shocking. However, current commercial and open-source models, which have been trained and tuned enough to be interesting, are mere passive models. They do not entail anything that can represent a third level of communication by itself, which would involve a grade of autonomy and especially self-organization (amongst other conceptual constrains, which are also often discussed to be a threat). In other words, they do not model the world, as we (try to) do, but language and images.
Ethically, the notably biggest problems that are disguised by a lot of overly excited discussions around the structural capabilities of AI are the fact that current ML models are trained on biased data (which are ‘imprinted’ in the output, so to speak), and that applications of such models in certain areas are, e.g., based on modern phrenological, antisocial and in general pseudoscientific theoretical approaches (e.g., “Detecting deception using machine learning with facial expressions and pulse rate”, we also recommend this recent “Rolling Stone” article about Joy Buolamwini, Timnit Gebru, Safiya Noble, Rumman Chowdhury — researchers approaching issues with biases).
Still, the right and ethical application of currently structured AI systems becomes a difficult task when a mere “predictive bias” (in the sense of weight) can be seen as the actual purpose of a machine learning system, since we actually wish to be able to adress human individuality / expactation to the output of, e.g., ChatGPT (i.e., relate to it). Since the output relies on the concept of weighting to create text-to-text or image-to-text contexts (as the result of training), it is not easy to control or represent what should be an output and what not in general (especially when this is all you got to model on: more or less data that is stolen from the internet).
To approach this problem, ChatGPT is therefore finetuned via reinforcement learning from the feedback of humans (HFRL; an attempt to ‘imprint’ solutions for second and third level problems into the model, so to speak). The current rather simple AI models are also in general trained with obscene amounts of random in the sense of undifferentiated and often contextless data from the internet (also without caring about copyright) and such models cannot ‘self-organize’ or so in any way. The easy proof is the need of HFRL.
The copyright issue is also a big problem of ChatGPT, since it often just reproduces whole passages of text or code taken from uncredited sources. When the output of ChatGPT is included in texts without rigourisly quoting at least ChatGPT can always be considered a kind of automated plagiarism (see this short article by Iris van Rooij). Image generation, which I used in this series as an educational demonstration of advanced information theoretic method (the title images of the articles; controversial?), appears to be a completely different issue, as it turns out. A plagiat in arts is not given when adapting a style. Arts therefore inherently embraces a “collage of works”, taking and further developing a style or motives. Mathematically stable diffusion also works quite different than ChatGPT and doesn’t just output whole images that don’t differ from the original (still a plagiat, if not at least credited to the AI). See this article from the New Yorker for more information on that topic. Producing art and producing scientific papers are therefore two quite different issues in a lot of ways.
However, even though the concept and application of “neural networks” have little to do with what we consider human communication, the above limitations and problem of current AI models still relate well to the second level and partially the third level of communication and its respective constrains: approximating meaning, but for itself lacking precision weighting or an intellectual policy from a third level interaction from cultural and social interaction and reflection on them (which would imply something like self-organization or autonomy to some degree, e.g., a human author with a transparent or communcatable intellectual policy). It is of course again a question of how to define semantics in order to adress it to, e.g., ChatGPTs capabilities, even to some degree; in general such a model is again not structured to do more than relate text to other texts / images, so any comparison to human multisensory inference relates more to a consumers perspective, where an AI model might be convincing enough for many when it is writging texts which sound as if they where from a human. Emily Bender et al. argue that current language models are nothing else than a “stochastic parrot”, in other words: they do not leaving beyond the first level or a rather abstract second level of communcation. In fact, they argue it lacks the ability in communication as such:
Text generated by an LM is not grounded in communicative intent, any model of the world, or any model of the reader’s state of mind. It can’t have been, because the training data never included sharing thoughts with a listener, nor does the machine have the ability to do that. This can seem counterintuitive given the increasingly fluent qualities of automatically generated text, but we have to account for the fact that our perception of natural language text, regardless of how it was generated, is mediated by our own linguistic competence and our predisposition to interpret communicative acts as conveying coherent meaning and intent, whether or not they do. The problem is, if one side of the communication does not have meaning, then the comprehension of the implicit meaning is an illusion arising from our singular human understanding of language (independent of the model). Contrary to how it may seem when we observe its output, an LM is a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning: a stochastic parrot. (Bender, Gebru et al., 2021)
I personally find computational neuroscientific paradigms arguing human inference to be a result of ‘working with probabilisties’ in some way, very convincing since it relates well to the contingent quality of language, experience and reasoning, as well as learning. Meaning is also in general often conflaited with somewhat strong subjectiveness and “mineness” (problem of solipsism), which is problematic, since laguage is a tool for interaction between humans (and even between humans and other lifeforms, animals between other animals etc.). Still, the general relational structure of, e.g., a LLM at least lacks to many relational characteristics to make it reasonably comparible with human inference, even if people try to use it to solve problems on the second and third level of communication. The upper argument by Bender and Gebru et al. is especially interesting as it argue that the output of an LLM is not part of a communication process, i.e., lacks a proper relational structure, as we have discussed it so far. It lacks a “model of the reader’s mind” (note that a parrot might have a better model of the mind of a human than a LLM, even though its output may not be as complex; so keep ind mind that “stochastic parrot” is better to be understood as a mere rhetoric comparison).
Upscaling, i.e. more data, more parameters, more computation, is often argued to be a solid solution to such problems of semantics and optimality, which arguably led ChatGPT evolve from a more or less toy model (ChatGPT 2 for example) to a model with quite stunning complexity in its output that people use to do work with (ChatGPT 3.5 and 4 for example). So more data, more computation etc. is something that unfortunately still plays a big role in further developing at least current types of commercial and open access AI systems. The problem is again, more computation means more money that has to be invested, bigger companies that own such models — and more data (texts, images) that has to be taken from somewhere, and therefore more work that has to be exploited in some form or another.
The above limitations and problems fataly lead to general issues in the transparency and the control we can have over AI tools in the current applied forms: where does the data come from and who decides what the “right bias” should be? The huge power of only few AI companies over their product and subsequently current and future societies is therefore an important issue for a responsible use of AI technology (e.g., see this interview with Meredith Whittaker). What we can definitely assume is that ML systems such as ChatGPT are not conscious, since it is again simply just not structured this way (it is again just a model of language related to language, and language related to images — not of a complex body related to the world, related to our society etc., not even of the brain).
Still, none of the above potential problems or misunderstandings of applying AI are suprising, since they can be anticipated with ease by looking at the mathematical structure of such models. We therefore think that computer science is one of the most important fields people should learn about. The idea that such models suddenly become conscious and start to kill everybody around them is a rather uninformed fear and not a problem societies should focus on. We also recommend this blog article by David Sumpter for some more easy access details on the above issues: “Worried about the rise of AI? Here is the research you should know about”.
Before we get back to a more life science approach, let us look at one examplatory output of ChatGPT 3.5, in case you have never used it (and for some comparison with the discussions above):
PROMT: Name five reasons why a racoon is the perfect companion during a zombie apocalypse. (ChatGPT 3.5, 08.08.2023)
ANSWER: While a raccoon might not be the most conventional choice for a companion during a zombie apocalypse, let's explore some creative reasons why they could be helpful:
Stealth and Agility: Raccoons are known for their exceptional agility and ability to move quietly. This skill could come in handy when navigating through zombie-infested areas, allowing them to scout ahead without attracting unwanted attention.
Foraging Expertise: Raccoons are highly resourceful and adept at foraging for food. During a zombie apocalypse, when supplies are scarce, having a raccoon companion could mean accessing hidden food sources and scavenging for crucial sustenance.
Small and Unassuming: Raccoons are relatively small in size, making them less conspicuous than larger animals. Their unassuming appearance might help avoid drawing attention from both zombies and hostile survivors, making them valuable stealth companions.
Alertness and Warning: Raccoons are naturally alert animals with keen senses. They can detect potential dangers or approaching zombies, providing valuable early warnings to their human companions, helping them stay one step ahead of potential threats.
Distraction and Diversion: Raccoons are known to be mischievous and can cause distractions when needed. During tense situations with zombies or hostile groups, a raccoon's playful antics could buy valuable time for the human survivor to escape or regroup.
While these reasons may seem entertaining, it's important to remember that a raccoon, being a wild animal, might not be a practical or safe choice for a zombie apocalypse companion. In such a situation, human allies with specialized skills, reliable equipment, and teamwork would be the best bet for survival. (link to original chat)
I must admit, I think different about a possible valuable companionship with racoons during a zombie apocalypse…. :P
Anyways, systems such as ChatGPT only involve a rather (often dangerously) playful technical and most of all consumers perspective on AI technology and its concepts — and not a perspective that explains human inference whatsoever. Some therefore argue for a rather strict distinction between human and computational inference. However, recall the problem with the term computation: it actually does not refer to a specific hardware or whatever framework or system one is thinking of.
A negtive example for making the above mistake while trying to distict human inference from artificial inference is Thomas Fuchs’ (psychiatrist and philosopher) rather polemic collection of articles and essays “Defense of the human” (especially the first two essays; German title: “Verteidigung des Menschen”, Suhrkamp 2021). The essays mostly discusses the comparison between human and computational inference as a threat to humanity, arguing for a strict segregation, disallowing any comparison, since Fuchs argue it causes that “being human” loses its meaning on several levels. My critique is that Fuchs misses to educate people on information technology and computationality in general, instead he is calling out every inference happening in silico as the same, in order to exlicitly mark it as a potential enemy that weakens the legitimacy of human (with a very explicit segregative undertone whatsoever). I at least was not convinced that this is a reasonable way of approaching computational neuroscience and AI research and technology in general.
Still, the question of how well mathematical concepts represent the above three levels of communication within fields such as biology and psychology is still answered very different from the question of how well a generated image fits a promt. Apart from the fact that physiological plausability and other aspects play a major role in evaluating computational models of human or biological inference: life sciences don’t want an output that is optimal for some random goal but a model that fits well to the organization and behavior of a biological system of a specific kind and complexity. The field of computational neuroscience, psychophysics and mathematical psychology is arguing therefore quite different in how to reflect on the structure and concept behind different mathematical models that are supposed to refer to cognition, brain functions, intelligence, ourselves as a whole organism etc. Again, the whole discourse about cybernetics, information theory and biological plausability leads back to the question that, e.g., Schrödinger asked, which we mentioned in the second part of this series: Asking “What is life?” in the sense of ‘How does life maintain in order to overcome entropy (“disorder”) in space and time?’ (the second law of thermodynamics).
As mentioned in other parts of this series, information theory, systems theory and cybernetics, conceptually merged into what is now more or less called (dynamical) systems theory, with different theoretical movements that developed around it. This is not surprising, since all of the above concepts were about ideas of information processing in different kinds of systems (in vivio, in silico…). The so-called Macy conferences held between the mid 40s and 50s played a major role in bringing together scientists from several disciplines. Further discourses that merged from this movement, e.g., in the 80s and 90s where chaos theory, the concept of complex adaptive systems, network theory, further neurocognitivist approaches, constructivism, social systems theory and other attemps to make use of the basic concepts of information processing, starting with people such as Shannon, Wiener, Weaver, Helmholtz, von Neumann… — and in general discourses around statistical thermodynamics in physics.
Two important protagonists in further developing systems theory as a biological as well as sometimes rather philosophical theory were Humberto Maturana and Francisco Varela — two biologists, neuroscientists and philosophers from Chile.
In nuce, both started do descibe life as an autonomous autopoietic (self-creating / self-organizing) process / system that emerges from complex relations in order to describe the structure of biological and cognitive systems maintaining in its bounds and contigencies over time. Varela (together with Evan Thompson) also included a lot of epistemological concepts into the discourse around systems theory, such as embodiment as an alternative to substance dualistic approaches (Husserl, Merleau-Ponty). The epistemological aspect of embodiment is rather simple and enlightening: your hand is given in two ways: a) as an object you see or a conceptual object (body as object, German “Körper”) and b) as a sensation, which also entails a experience of “mineness” (body as subject, German “Leib”; for a medical example see, e.g., Ciaunica et al. on the topic of depersonalization and derealization).
The whole current psychological mindfulness movement was also influenced by the above (as much as any other wave in cognitivism though). With the latter mindfulness movement the discourse around life and cognition often successively shifted away from a scientific perspective. To put it simple: the mindfulnes scene often has an esoteric, lifestyle and in parts a mere pseudo-science problem. Exploitation of foreign cultures is also a common critique on it.
First of all, the fact that ancient texts from whatever cultural source are taken to account in psychology, neuroscience and neurophilosophy is not unusual — in the past these were especially writings from todays India, China and Japan (e.g., Bhuddism and earlier movements). A simple scientific reason to do so is: ancient texts from people that were able to write and had a lot of time contemplating in one form or another are valuable in order to compare earlier or unknown concepts (long forgotten and / or culturally unknown) with todays concepts of life, experience, cognition etc. under the perspective of modern science. For instance, such “conceptual archeology” has the potential to reclaim perspectives from the past without the influence of current social dogmas for example. In general artefacts from the past of any kind can partially show us how different and how similar we lived and evolved compared to humans in other contexts of being (such as time, culture or region).
On the other hand, one obvious but underestimated problem is that reading ancient eastern literture is not an easy task and translations often strongly differ from the original (recall Weaver’s three levels of problems of communcation!). For example, the German version of the Tao te king (credited to Laotse) by Richard Wilhem, a christian missionary(!), strongly deviates from more scientific translations — such that a lot of the original thoughts and ideas where changed (strongly biased). Nevertheless, Wilhelms version is commonly found in bookstores, even though it has a troubling history and mostly only little commentary on it.
Another problem is the adaption of a lifestyle in a western context. In general, especially given a western colonist history, it is controversial when people are trying to adapt an absorbing lifestyle in general, modified ideologies or a religion without reflecting on it or without trying to make its insights on humanity accessible in a secular, critical, respectful and in general scientific way (so it potentially serves everybody). One particular critique on the mindfulness movement is therefore the mentioned trend of monetising and exploiting foreign cultures for reasons of promoting and literally selling “alternative” western lifestyles in several ways (it intentionally an oxymoron). Especially Yoga is something that at least in Germany is often not questioned in any way, even though it follows highly problematic (and often science-washed) ideals and is built upon patriacal structures (the concept of a “guru” as a “wise superior male”). Another example is promoting the impression that, e.g., Bhuddism is a somewhat mild, peaceful or a magically superior and “different” approach to the world and self. For further reflections on this, I can recommend this talk with Evan Thompson on his book “Why I am not a Bhuddist” (which also greatly influenced the discourse on systems theory, as mentioned above).
Another branch of the further developement of information theoretic / cybernetic concepts is of course the rather physics oriented and computational movement that was and still is trying to further develope simulations of biological and cognitive dynamical systems. However, such a developement is strongly dependent on hardware and software technology. Using software such as R for complex statistical analysis wasn’t as widely possible 50 years ago as it is today, where even most of todays smart phones are enough to run a small data science lab.
For example the first neural network in the modern sense was already defined by Warren McCullach and Warren Pitts (a psychiatrist and an autodidact) in 1943 (!!) and was first implemented in the late 50s by Frank Rosenblatt.
Computationally modelling, testing or even proving a lot of the mathematical concepts defined in the 20th century or earlier is often still not that easy — we are just becoming technological capable to do so with affordable hardware! The huge difference between intellectual / conceptual and technological capabilities may also be a reason that systems theory partially shifted away from computational approaches towards a more conceptual discourse. Now that AI systems are used more and more — and given the issues around it that we discussed above — some scientists argue to “reclaim AI as a theoretical tool for cognitive science”. We recommend this pre-print by Iris van Rooij et. al on that topic!
In any way, the imprints that the computational / mathematical and the intellectual / conceptual discourse around information theory and technology have left in nowadays medical-psychological practice are quite huge — as mentioned before: several forms of cognitive behavioral therapy (incl. mindfulness), the bio-sycho-social model, systemic therapy, cognitive science and neuroscience in general, AI research, a lot of technology for diagnostics (e.g., SPM), genetic research, computational modelling in biology in general…
In later parts of this series we are going to have a closer look at specific current applications of concepts that are influenced by information theory, e.g., in the medical field. One well known example is the dysconncetion hypothesis (Friston, et al. 2016), which relies on predictive processing / active inference as theory of how the nervous systems and in particular the brain works performs inference. Bayesian brain theories, as mentioned before, basically argue that the brain is updating a generative model. The dysconnection hypothesis is arguing that schizophrenia is the result of aberrant precision weighting of external and internal information flow within that process of updating a model, resulting in dellusional or ‘false’ inference.
To understand what precision weighting roughly means, let us think of the following example: think of a tree that — from the shape of it — also looks a bit like a lion from some perspective. Now imagine losing the control over the ability to freely tilt the tree into one of the two contexts, a) “looks like a lion (shape)”, and b) “it looks like a regular tree”. In other words: imagine being unable to distinguish between “has the shape of a lion” and “is a lion”. In any way, the result of aberrant precision weighting could be a belief that there literally is a lion, given that b) is overweighted (overfitting; also fits well to symptoms such as concretism that sometimes appear in patients with schizophrenia). The totally ‘understandable’ reaction of such an inference (such a sudden highly convincing belief) could be fear, running away, trying to fight the tree, some complex delusional narrative to explain where the lion came from etc. Paranoia, hallucinations and other phenomena can be explained quite well this way. Active inference / predictive processing als fits well to the neurophysiology and neuroanatomical representations of such a process. First we are going to gain more basic insights into further developements of basic information theory (approximate Bayes inference in the form of variational inference methods).
We hope that this part has given you a rough overview on the impact of information theory on fields such as lunguistics / semiotics, medicine, and how it was further developed within the life sciences. We tried not to go into every branch too deep — the intention was more to spark some interest into further research yourself and we hope we succeeded.
In the fifth part of this series, we are going to look into further extensions of classic Bayes’ inference, namely approximate Bayes’ inference in the form of variational inference methods. We will also cover some other basics of extended Bayesian inferen methods, such as relative entropy / KL divergence. After that, we will introduce you to predictive processing / active inference / free energy principle and its applications in medical research (part VI). Computational modelling, such as aberrant precision weighting (predictive processing / active inference) in models of perception, can be compared with behavioral as well as EEG or fMRT data of patients. One of the most fascinating parts about active inference / predictive processing in particular is especially the fact that a Bayesian decision model can be translated into a model of neuronal activity, just via some rearrangements.