Skip to main content
SearchLoginLogin or Signup

Into the Stat-o-Sphere

An Introduction into our Tutorial Collection

Published onAug 11, 2022
Into the Stat-o-Sphere
history

You're viewing an older Release (#2) of this Pub.

  • This Release (#2) was created on Mar 30, 2023 ()
  • The latest Release (#6) was created on Dec 21, 2023 ().

Review: This article was reviewed by the BEM-Editors Rico Schmitt, Philippa Schunk, Kolja Rath and Joëlle Lousberg.

You want to join and help us reviewing or even write our tutorials? Feel free to contact us via: [email protected]


Statistics is probably one of the most un-wanted elements of a medical or life science curriculum at any university. It often feels like both students and lecturers do not really care about it, considering it at best a necessary evil.

WE BEG TO DIFFER!

… inviting you to follow us into to the Stat-o-Sphere.

Stat-o-Sphere is an open educational editorial collection that provides low-threshold, easy-access and slow-paced statistics tutorials for absolute beginners ─ written by students, for students ─ motivated to evoke and elevate both the research and reviewing abilities of our readers. By conveying and animating the experiential intuition behind statistical description and inference, especially presenting the conceptual beauty and simplicity behind it, we attempt to follow paths that require the least amount of background such that in theory anyone could gain fundamental insights of statistical reasoning as such.

We, the data and statistics team of Berlin Exchange Medicine, not only believe that this is possible, but necessary ─ for medical students, for biologists, the care sector, even for philosophers. Especially for the field of medicine and care, which naturally comes with an effort of translating scientific reasoning and methodology, statistics should not be considered an outsourceable issue, or a mere heuristic that does not need any further attention. Such attitude essentially not only underestimates, but intentionally ignores the poetics behind statistical inference in the first place.

In the end, we all know that statistics is essential to our modern understanding of evidence-based medicine. And yet, statistics has suffered from mystification and misinformation in several ways, which was especially evident during the SARS-CoV2 pandemic. Apart from a rather ‘wide-spread statistical vertigo’, such mystification can also be encountered in rather banal contexts, e.g., when it comes to discussing or ‘expecting’ the complexity of statistics or mathematics in general. We believe that most of the complexity can be clearly addressed and entangled, such as hidden paradigms implicitly leading intellectual or conceptual discussions on certain methods, e.g., the often overlooked difference between the Bayesian and the frequentist interpretation of conditional probability (especially the twisty p-value becomes instantly intuitive, when looked at as a special form of conditional probability). The latter difference will be the topic of the first tutorial of our series on inferential statistics, so no worries if none of the terms appear familiar to you!

Fig. 1 The three realms of interest within the Stat-o-Sphere. At the beginning we will focus on tutorials that are rather related to programming languages such as R or Matlab and mathematics in general (involves greater effort, but gives full flexibility in exploring and operating with data). As soon as our collection has gained a stable basis, we will give general topics such as review or experimental design more attention, as they demand a rather abstract and methodologically wide-spread overview over statistics (linear models, effect-size, p-values, F-statistics, power analysis etc. all at one place), and also involves the reflection on skills in areas such as text analysis as well. For these needs we have developed other formats within BEM, such as the review crash course, fellowship programs, as well as the JCed format (educational journal club), which we will accompany with our work. Either way, we are confident to provide you with flexible content, as soon as our collection and our journal in general has grown in the future.

In the end, statistics is nothing else than our hypothetical argumentation on a course of events translated into another language, essentially mathematical and computer language. Our goal is to explore such ‘translational potentials’ within the field of statistics and brighten hidden complexities by finding new ways of reflecting, encountering and especially speaking about statistics.

Beyond the field of medicine and science, the emergence of new inferential systems, such as advances in information theory or any kind of machine learning, has brought a lot of wider-public attention to statistical methods. The public discussion though is again mostly shaped by a lack of understanding how these technologies actually work and especially by withholding what they actually are: a certain kind of mathematical algorithm (i.e., can theoretically be calculated on paper). Demystifying statistical inferential methods and technologies, and empowering people to understand instead of opinionizing on statistics is a core goal of our collection, as we believe that probabilistic inference is *probably* the most intuitive field in mathematics and science, even though a lot of us appear to mostly abstract away from themselves to rather uncanny ‘degrees of freedom’, when encountering graphs and formulas. We argue this to be framed.

In fact, computational neural networks (machine learning), as well as a lot of current theories within computational neuroscience, suggest exactly that: our brain, every of its cells is performing statistical inference in the first place (e.g. Bayesian-Brain-Hypothesis, suggesting that we perform constant hypothesis testing in order to update models of the world and ourself, poetically addressing both fallibility of human inference and falsification over time (updating a model)).

By providing insights and basic knowledge on rather complex topics within statistics, such as machine learning or, e.g., epidemiological modelling, we also want to provide access to topics which are rather hard to get access to in the first place.

Fig. 2 StatoSphere is an educational-information providing collection within the BEM editorial section IV, data and statistics, and is roughly proportional to the logarithm to the base of code of the probability of statistics, given tutorials that are in joint with a model of the infinite root of Open Education.

Open education as such is of course closely related to the internet, providing open access information in the form of videos or text. Open education though not only relies on download contingencies, but also on breaking boundaries of communication that lead to a disconnection of information flow in some form or another ─ either by, e.g., excluding diversity, applied gate-keeping strategies, or by contextually drawing lines of privilege, such as prior knowledge.

Open education also plays another crucial ─ but often forgotten ─ role in research, as it opens up the possibilities for interdisciplinary exchange, which only becomes possible, when all sides share their methods in full, in order to evaluate existing and develop new scientific methods together. For this to happen, educational resources for any level of prior knowledge and in general individual attempts of providing and sharing educational information play a central role.

We believe that a lot of the obstacles on the way to understand statistics mentioned above can be easily overcome. Most educational failures are made on the foundational level and we believe that with a stable intuition, people can be empowered to orientate within mathematical logic themselves. In most of the future tutorials of this collection, the combination of intuition and mathematics will eventually merge into commented code that makes it possible to recalculate every step of the math, play around with the input values and introduces readers into the usage of programming languages such as R, Matlab, STATA, SPSS.

Fig. 3 Load R package “Stat-o-Sphere” via executing library(“Stat-o-Sphere”) within your freshly opened script (mark the respective line and press ALT+ENTER). We recommend starting with the tutorial function. BEM_Tutorials is the name of a function with the input parameter “beginners”, and within that function some math and coding is done, eventually leading to an output of “advanced”. The function can be called via BEM_Tutorials(INPUT = yourself). (Note that this package does not exist: it is just poetic code).

Apart from an understanding of mathematics and the intuition behind it, statistical evaluation always involves the usage of software in some way or another, in other words: a computer doing math. Using such software can be intriguing and even miraculous ─ but also daunting and easily overwhelming all at once. Nevertheless, a vast amount of written and video tutorials produced by highly motivated people, sharing their insights and excitement instantly, gives hope that applying statistical methods to a given set of data is going to go well for us. And in fact, it’s true: online tutorials are a huge success, following a simple rule, a simple principle: sharing knowledge. 

A lot of us ─ including myself for a long time ─ still don’t get further as failing to properly load the .csv within R. This, or some results ranging from strange to funny in the beginning, can be enough for some to abandon further attempts, in fear of destroying the data itself. Being stuck for banal reasons, growing uncertainty from clicking one’s way around and dealing with often incomplete information sources (e.g., code given, but no mathematical details) makes it hard to smoothly enter the cybernetic realm of statistical computation for beginners.

We do not only believe that this is a shame, especially for open access programs such as R, but this to be completely unnecessary. By providing slow-paced tutorials with the least number of missing links, we hope to change the impression of statistics as a nebulous representation of hidden mechanisms, fugitive and even darkly. We also believe data scientific methods should be something that medical students and scientists in general are naturally familiar with, as it paves the path of their clinical and general scientific inference ─ something that should not rely on opinion, habit or instant beliefs.

Fig. 4 Our efforts of providing you with good information will leave us with a constant and seemingly unsolvable ‘optimization problem’, though we will still maintain the search for methods to keep the error related to our readers variables ‘expectation’ and ‘reality’ as low as possible, when linearly hyping tutorials on statistical methodology.

Our attempts will of course have its boundaries itself ─ but this is where you, the reader, comes in play. Our website allows readers to openly feedback our tutorials, similar to the open peer review process of our journal articles.

Fig. 5 Ms. Miranda, longing for feedback. We are also happy to announce that our tutorials will be accompanied by the work of feathery and furry specialists. Photo by Alfred Kenneally.

Feedback of any kind (minor or major issues, open questions, tutorial requests) are very welcome ─ either from students or from professionals in the respective field, as our student journal still relies on expertise in some form or another. We are not trying to make ultimate tutorials, even though we are trying our best. Most of all, we believe that the attempt to share and to expand fields of expressing knowledge and ideas is the most central part of any educational or in general set of research ideas. We are therefore more than open for any interaction with you and more than happy to hear about your experiences with any of our content: Equivalent to our open peer-review process, anyone can leave us a comment and ‘review’ our tutorials. At the end, educational content also has to be reviewed in some form or another, as understanding is nothing we can ground for ourselves as much as we like to.

Fig. 6 Here we present to you the ‘posterior tutorial probability’, which is defined as the probability of a future tutorial, given current tutorials.

The scope of our tutorials will range from ‘deeper dive into the actual math’ to rather 'application- and review-oriented’ approaches to statistics, also related to other projects within BEM, such as JCed, in order to serve the several levels of interests and needs related to information on statistical methodology that you, the reader, may have.

Another core feature of our tutorial collection will be the close relation to current publications that provide open data. In such cases we will modularly attach special chapters or pubs to given or newly created tutorials on the applied method, so that current publications optionally serve as an example for, e.g., a tutorial on linear regression models. With this we also want to motivate authors to not only share their work, but also their methods in detail with us (note that we are always more than happy to help people with any level of background for such attempts, even beginners).

So far, we have not planned to provide a full custom-made educational course on statistics, but will still be covering basic topics such as the mentioned linear regression model and conditional probability, especially in the beginning. Other tutorials will be motivated rather by interested (expertise in the team) or occasion (open data publications) instead of ‘relevance’ within a general scope of statistics. However, we are confident that our collection will grow over time, eventually covering a wider range of interest and need.

The overall goal of our tutorial collection is, again, to provide a room for peer-teaching experiences, as we believe that sharing knowledge and any attempt to do so, is a core competence of any researcher, as well as student (we may also organize “peer-teaching events” of some kind, but we will see).

Fig. 7 Possible dependencies between authors and readers. On the left side we see the worst case, i.e., absolutely ‘code red’ scenario between authors and readers: The authors are here considered independent, as they influence the readers by there published work, but are themselves unfortunately not influenced by the readers. On the right side we see our desired scenario of an interdependency between authors and readers over time.

However, if you are interested and motivated to contribute to our collection, sharing your methods in some way or another, or if you are interested in contributing to our editorial section or journal in general, feel free to contact us via: [email protected]

Title image by Eberhard Grossgasteiger.

Comments
0
comment
No comments here
Why not start the discussion?