Skip to main content
SearchLoginLogin or Signup

Into the Stat-o-Sphere

An Introduction into our Tutorial Collection

Published onAug 11, 2022
Into the Stat-o-Sphere

Review: This article was reviewed by the NOS-Editors Rico Schmitt, Philippa Schunk, Kolja Rath and Joëlle Lousberg.

You want to join and help us reviewing or even write our tutorials? Feel free to contact us via: [email protected]


Statistics is probably one of the most un-wanted elements of a medical or life science curriculum at any university. It often feels like both students and lecturers do not really care about it, considering it at best a necessary evil.

WE BEG TO DIFFER!

… inviting you to follow us into to the Stat-o-Sphere.

Stat-o-Sphere is an open educational editorial collection that provides low-threshold, easy-access and slow-paced statistics tutorials for absolute beginners ─ written by students, for students ─ motivated to evoke and elevate both the research and reviewing abilities of our readers. By conveying and animating the experiential intuition behind statistical description and inference, especially presenting the conceptual beauty and simplicity behind it, we attempt to follow paths that require the least amount of background such that in theory anyone could gain fundamental insights of statistical reasoning as such.

We, the data and statistics team of NOS, not only believe that this is possible, but necessary ─ for medical students, for biologists, the care sector, even for philosophers. Especially for the field of medicine and care, which naturally comes with an effort of translating scientific reasoning and methodology, statistics should not be considered an outsourceable issue, or a mere heuristic that does not need any further attention. Such attitude essentially not only underestimates, but intentionally ignores the poetics as well as responsibility of applying statistical inference in the first place. Patients also rely on medical personal that is being able to convey the basic logic behind their inference, their decisions.

In the end, we all know that statistics is essential to our modern understanding of evidence-based medicine. And yet, statistics has suffered from mystification and misinformation in several ways — problems that were especially prevalent during the SARS-CoV2 pandemic. Apart from a rather ‘wide-spread statistical vertigo’, such mystifications can also be encountered in rather banal contexts, e.g., when it comes to discussing or ‘expecting’ the complexity of statistics or mathematics in general. We believe that most of the complexity can be clearly addressed and entangled, such as misleading or false linguistic definitions of mathematical concepts, as well as hidden paradigms implicitly leading intellectual or conceptual discussions on certain methods. The often overlooked difference between the Bayesian and the frequentist interpretation of conditional probability is an example in that respect (especially the twisty p-value becomes instantly intuitive, when looked at as a special case of conditional probability). Note that the latter paradigmatic difference is also be the first topic of our series on inferential statistics.

Fig. 2 The three realms of interest within the Stat-o-Sphere. At the beginning we will focus on tutorials that are rather related to programming languages such as R or Matlab and mathematics in general (involves greater effort, but gives full flexibility in exploring and operating with data). As soon as our collection has gained a stable basis, we will give general topics such as review or experimental design more attention, as they demand a rather abstract and methodologically wide-spread overview over statistics (linear models, effect-size, p-values, F-statistics, power analysis etc. all at one place), and also involves the reflection on skills in areas such as text analysis as well. For these needs we have developed other formats within NOS, such as the review crash course, fellowship programs, as well as the JCed format (educational journal club), which we will accompany with our work. Either way, we are confident to provide you with flexible content, as soon as our collection and our journal in general has grown in the future.

In the end, statistics is nothing else than our hypothetical argumentation on a course of events translated into another language, essentially mathematical and computer language. Our goal is to explore such ‘translational potentials’ within the field of statistics and to brighten up hidden complexities by finding new ways of reflecting, encountering and especially speaking about statistics.

Beyond the field of medicine and science, the emergence of new inferential systems, such as advancements in information theory or any kind of machine learning, has brought a lot of wider-public attention to statistical methods. The public discussion though is again mostly shaped by a lack of understanding how these technologies actually work and especially by withholding what they actually are: a certain kind of mathematical algorithm (i.e., can theoretically be calculated on paper). Demystifying statistical inferential methods and technologies, and empowering people to understand instead of opinionizing on statistics is a core goal of our collection, as we believe that probabilistic inference is *probably* the most intuitive field in mathematics and science — even though a lot of us appear to mostly abstract away from themselves to rather uncanny ‘degrees of freedom’ when encountering graphs and formulas. We argue this to be framed.

In fact, computational neural networks (machine learning), as well as a lot of current theories within computational neuroscience, suggest exactly that: our brain, every of its cells is performing statistical inference in the first place (e.g. Bayesian-Brain-Hypothesis, suggesting that we perform constant hypothesis testing in order to update models of the world and ourself, poetically addressing both fallibility of human inference and falsification over time (updating a model)). Fields such as computational neuroscience, psychophysics and mathematical psychology are not the only topics we are trying to introduce you with, since (potential) applications of statistical methods can be found in every academic field.

By providing insights and basic knowledge on rather complex topics within statistics, such as machine learning or, e.g., epidemiological modelling, we also want to provide access to topics which are rather hard to get access to in the first place.

Fig. 3 StatoSphere is an educational-information providing collection within the NOS editorial section IV, data and statistics, and is roughly proportional to the logarithm to the base of code of the probability of statistics, given tutorials that are in joint with a model of the infinite root of Open Education.

Open education as such is of course closely related to the internet, providing open access information in the form of videos or text. Open education though not only relies on download contingencies, but also on breaking boundaries of communication that lead to a disconnection of information flow in some form or another ─ either by, e.g., excluding diversity, applied gate-keeping strategies, or by contextually drawing lines of privilege, such as prior knowledge.

Open education also plays another crucial ─ but often forgotten ─ role in research, as it opens up the possibilities for consistent interdisciplinary exchange, which only becomes possible, when all sides share their methods in full, in order to evaluate existing and develop new scientific methods together. For this to happen, educational resources for any level of prior knowledge and in general individual attempts of providing and sharing educational information play a central role.

Fig. 1 In a recent article in nature the work of John Carlisle was discussed, an anaesthetist that checks medical papers for flaws and fraud (for the latter he uses the term zombie trials). There are two answers to the problem above: a) more open access, also including open data. b) most of all, people should spend more time and should also be given more time and guidance for understanding statistical methods, in order to take the respective responsibility. Methods that Carlisle uses to do so are simple and well-known, e.g., the so called Benford’s Law (German video by arte/English video by numberphile). Scientific methodology should not be based on superficial attitudes, nor a habitus that tries to gate-keep its way through science by discarding critical thinking on statistics as “not necessary”. Our open science agenda is trying to make a difference in that respect, working against a publish-or-perish culture, in which superficial statistical analysis appears to thrive the most.

We believe that a lot of the obstacles on the way to understand statistics mentioned above can be easily overcome. Most educational failures are made on the foundational level and we believe that with a stable intuition, people can be empowered to orientate within mathematical logic themselves. In most of the future tutorials of this collection, the combination of intuition and mathematics will eventually merge into commented code that makes it possible to recalculate every step of the math, play around with the input values and introduces readers into the usage of programming languages such as R, Matlab, python, STATA, SPSS…

Fig. 4 Load R package “Stat-o-Sphere” via executing library(“Stat-o-Sphere”) within your freshly opened script (mark the respective line and press ALT+ENTER). We recommend starting with the tutorial function. NOS_Tutorials is the name of a function with the input parameter “beginners”, and within that function some math and coding is done, eventually leading to an output of “advanced”. The function can be called via NOS_Tutorials(INPUT = yourself). (Note that this package does not exist: it is just poetic code).

Apart from an understanding of mathematics and the intuition behind it, statistical evaluation always involves the usage of software in some way or another, in other words: a computer doing math. Using such software can be intriguing and even miraculous ─ but also daunting and easily overwhelming all at once. Nevertheless, a vast amount of written and video tutorials produced by highly motivated people, sharing their insights and excitement instantly, gives hope that applying statistical methods to a given set of data is going to go well for us. And in fact, it’s true: online tutorials are a huge success, following a simple rule, a simple principle: sharing knowledge. 

A lot of us ─ including myself for a long time ─ still don’t get further as failing to properly load the .csv within R etc. Such obstacles, or, e.g., output results ranging from strange to funny, can be enough for some to abandon further attempts in fear of destroying the data itself — or by believing in a fundemental lack of knowledge to work with programming languages such as R in the first place. Being stuck for banal reasons, growing uncertainty from clicking one’s way around and dealing with often incomplete information sources (e.g., code given, but no mathematical details) makes it hard to smoothly enter the cybernetic realm of statistical computation for beginners.

We do not only believe that this is a shame, especially for open access programs such as R, but this to be completely unnecessary. By providing slow-paced tutorials with the least number of missing links, we hope to change the impression of statistics as a nebulous representation of hidden mechanisms, fugitive and even darkly. We also believe data scientific methods should be something that medical students and scientists in general are naturally familiar with, as it paves the path of their clinical and general scientific inference ─ something that should not rely on opinion, habit or instant beliefs.

Fig. 5 Our efforts of providing you with good information will leave us with a constant and seemingly unsolvable ‘optimization problem’, though we will still maintain the search for methods to keep the error related to our readers variables ‘expectation’ and ‘reality’ as low as possible, when linearly hyping tutorials on statistical methodology.

Our attempts will of course have its boundaries itself ─ but this is where you, the reader, comes in play. Our website allows readers to openly feedback our tutorials, similar to the open peer review process of our journal articles.

Fig. 6 Ms. Miranda, longing for feedback. We are also happy to announce that our tutorials will be accompanied by the work of feathery and furry specialists. Photo by Alfred Kenneally.

Feedback of any kind (minor or major issues, open questions, tutorial requests) are very welcome ─ either from students or from professionals in the respective field, as our student journal still relies on expertise in some form or another. We are not trying to make ultimate tutorials, even though we are trying our best. Most of all, we believe that the attempt to share and to expand fields of expressing knowledge and ideas is the most central part of any educational or in general set research ideals. We are therefore more than open for any interaction with you and more than happy to hear about your experiences with any of our content: Equivalent to our open peer-review process, anyone can leave us a comment and ‘review’ our tutorials. At the end, educational content also has to be reviewed in some form or another, as understanding is nothing we can instantly ground for ourselves — as much as we like to give you instant access to any knowledge.

Fig. 7 Here we present to you the ‘posterior tutorial probability’, which is defined as the probability of a future tutorial, given current tutorials.

The scope of our tutorials will range from ‘deeper dive into the actual math’ to rather 'application- and review-oriented’ approaches to statistics, also related to other projects within NOS, such as JCed, in order to serve the several levels of interests and needs related to information on statistical methodology that you, the reader, may have.

Another core feature of our tutorial collection will be the close relation to current publications that provide open data. In such cases we will modularly attach special chapters or pubs to given or newly created tutorials on the applied method, so that current publications optionally serve as an example for, e.g., a tutorial on linear regression models. With this we also want to motivate authors to not only share their work, but also their methods in detail with us (note that we are always more than happy to help people with any level of background for such attempts, even beginners).

Apart from providing basics, other tutorials will be motivated rather by interested (expertise in the team) or occasion (open data publications) instead of ‘relevance’ within a general scope of statistics. However, we are confident that our collection will grow over time, eventually covering a wider range of interest and need.

The overall goal of our tutorial collection is, again, to provide a room for peer-teaching experiences, as we believe that sharing knowledge, and any attempt to do so, is a core competence of any researcher, as well as student (we may also organize “peer-teaching events” of some kind, but we will see). Students and teachers should take and also be given more time to obtain a lasting and consistent instead of a heursitic understanding of the logic behind statistics and probability theory.

Fig. 8 Possible dependencies between authors and readers. On the left side we see the worst case, i.e., absolutely ‘code red’ scenario between authors and readers: The authors are here considered independent, as they influence the readers by there published work, but are themselves unfortunately not influenced by the readers. On the right side we see our desired scenario of an interdependency between authors and readers over time.

However, if you are interested and motivated to contribute to our collection, sharing your methods in some way or another, or if you are interested in contributing to our editorial section or journal in general, feel free to contact us via: [email protected]

Title image by Eberhard Grossgasteiger.

Comments
0
comment
No comments here
Why not start the discussion?