Ero

Chaos, Entropy, Order

11 posts in this topic

The following journal will be based on excerpts from the notes of my ongoing investigation into the paradigm of 'Chaos, Entropy, Order', seeking to establish the foundations of the upcoming transition from Tier 1 (deterministic and closed-form thinking) to Tier 2 (non-deterministic/stochastic emergence of macroscopic/holistic order). It is inspired by the long-forgotten history of my ancestry, namely the Orphic tradition and more specifically: the Orphic triad of Chaos, Eros and Gaia (termed by Ralph Abraham but miscategorized as Greek in origin).

Its primary purpose is that of a rigorous reformulation of contemporary Mathematics, Statistics and Theory of Computation (based on my background) by taking holistic and interdisciplinary inspiration from variety of topics and subjects, such as Statistical Mechanics, Measure Theory & Functional Analysis, Integrable Probability, Category Theory, Representation Theory, The Free Energy Principle, Active Inference, Bayesian Mechanics, Chaos Theory, QFT, Fluid Computing, etc. The format would be loose and free-flowing but primarily technical in nature. 

Scientists, whose works serves as inspiration - Karl Friston, Michael Levin, Chris Fields, H, Haken, John Baez, Ilya Prigogine, David Spivak, S. Smirnov, A. Grothendieck, A, Kologomorov, A. Borodin, etc. 

It has been by far the largest recontextualization in my neural net to date, combining years of experiences both in the spiritual and psychonaut, as well as the mathematical/ scientific realms. It has given me a decade-long purpose with both spiritual and technical realization.

Date of crystalization in consciousness: 18th of August, 2024

Main ideas

- First is the postulate of the fundamentally unknowable and intractable nature of Chaos, i.e the ontology of Consciousness in its primal form. 

- The second is the postulate of entropy. From it follows the fact that Mathematics and Science are our generative models based on the sampling of Chaos through the Active Inference/Free-energy principle with the purpose of minimizing surprise (maximizing entropy). The principle implies a never-ending cycle of model building/refinement and eliminates the existence of a UFT/ToE, i.e an 'end to science' precisely.

Papers/books, such as 'On the statistical mechanics of life' or 'Every Life is on Fire', 'Evolution 2.0', 'Undeniable' examine more in depth the misconception entropy implies disorder - in fact the opposite - Life emerged as the most efficient driver of entropy:

'You start with a random clump of atoms, and if you shine light on it for long enough, it should not be so surprising that you get a plant' - Jeremy England

- Third is the postulate of abstraction/order - Reality as a Holon - described by the need to observe reality at different levels independently:

Quantum Fields -> Particles -> Atoms->Molecules-> Cells -> Tissue ->Organs -> Organisms -> Societies/herds/flocks-> Ecosystems-->...

Cognition/Intelligence is presence everywhere - from the smallest collective behavior of electrons, to the largest systems and networks. 

Technical Foundations

The complexity we encounter in contemporary problems of energy, climate, societies and ecosystems cannot be handled by the contemporary human-centric, deterministic and closed-form solutions. The need for stochastic and emergent systems has become apparent. The theory would be based on the following technical undertakings:
- Transitioning from Hamiltonian (deterministic, trajectory-based) to Langevin dynamics (stochastic, probability-based)

- Categorification of Probability Theory and Stochastic Systems, allowing for the abstraction of entropy/free-energy principle as a topological operad, applicable to any system, ranging from Plasma and Fluids, to Cellular and Biological networks, to contemporary AI models. 

- Embedding of the novel framework into appropriate computational paradigm, resting on the idea of Functorial bridges. 

- Transitioning away from Set Theory towards Topos-inspired foundations of mathematics, allowing to embed all current mathematical structures as macroscoping phenomena of miscroscopically-stochastic systems

 

Each postulate and technical direction is sufficiently rich to warrant year-long investigations. For that purpose, my timeline and approach is very strategically adjusted based on my current technical background, as well as my vision for the next few years. Current scope of investigation - The rigorous foundations of Random Matrix Theory (Created by Wigner as part of the Statistical Physics of QFT) based on Heavy-Tailed distributions. The inspiration has been work by researchers at DeepMind, Gerard Ben Arous, and Charles Martin, the lattter of whom has empirically found direct connection with the weight matrices of NNs, and model performance, independent of training and testing data.

 

Share this post


Link to post
Share on other sites

Newly discovered mathematician/ program in the foundations of mathematics:

Homotopy Type Theory, with a primary author Vladimir Voevodsky - similar to Barry Mazur, his only higher education is in fact a PhD from Harvard, having been expelled from Moscow State University due to skipping failing classes. Nonetheless, his research papers were so impressive that he got in without even applying. 

His formulation of motivic cohomology (study of the invariants of algebraic varieties and schemes) is believed to be the correct one, uniting previously distinct cohomology theories, such as singular (what I am studying now), de Rahm, etale and crystalline cohomology. (Definitely a topic I need to consider for my PhD)

During this foundational work, for which he received a Fields medal in 2002, he started realizing that 'human brains could not keep up with the ever-increasing complexity of mathematics. Computers were the only solution'

His work has in fact been foundational to the development of the *Coq* formal proof assistant. The main idea behind homotopy type theory is that types can be regarded as spaces in homotopy theory or higher-dimensional groupoids in category theory. The higher (graded) homotopy groups form a weak ∞-groupoid with k-morphisms at level k of the filtration. The traversal (paths) of this multi-dimensional space represents the proofs (for e.g., proofs of existence have the space is filled with elements of the satisfactory set). 

An analogy that was given on reddit, which I really liked, is that Topos Theory is the computer architecture and hardware construction (mathematical universe), whereas HoTT is the software and programs in it (proofs). 

Definitely have to read the original (and still only book) on the topic - https://hott.github.io/book/hott-ebook-15-ge428abf.pdf

Share this post


Link to post
Share on other sites

Pulled an 11h day working through the literature. Meeting with advisor is schedule on Wednesday. Current thesis proposal draft:
 

# Thesis Proposal: Heavy-tailed Random Matrices  

## Introduction

Recent developments in the field of Deep Neural Networks (DNNs) have proven incredibly effective in extracting correlations from data. However, the current paradigm and methodology are still largely based on heuristics and lack the theoretical underpinnings necessary for both prescriptive and explanatory properties. Many approaches have been proposed with the purpose of alleviating this so-called 'black box problem' (i.e., lack of interpretability), ranging from the early attempts at using Vapnik-Chervonenkis (VC) theory [1] to subsequent applications of Statistical Mechanics [2-4]. Arguably, none have been as effective at predicting the quality of state-of-the-art trained models as Random Matrix Theory (RMT) [5,6], and more specifically, the recently established Theory of Heavy-Tailed Self-Regularization (HT-SR) by Martin and Mahoney [7-11]. Their empirical results have led to the creation of novel metrics, as well as a variety of interesting theoretical results with respect to the study of the generalization properties of stochastic gradient descent (SGD) under heavy-tailed noise [12,13].

## Background and Significance

### HT-SR Theory

Martin and Mahoney's approach is based on the study of the empirical spectral density (ESD) of layer matrices and their distributions [7]. More specifically, for \( N \times M, N \geq M \) real-valued weight matrices \( W_l \) with singular value decomposition \( \mathbf{W} = \mathbf{U}\mathbf{\Sigma}\mathbf{V}^T \), where \( \nu_i = \mathbf{\Sigma}_{ii} \) is the \( i \)-th singular value and \( p_i = \nu_i^2/\sum_i \nu_i^2 \). They define the associated \( M \times M \) correlation matrix \( \mathbf{X}_l = \frac{1}{N}\mathbf{W}_l^T\mathbf{W}_l \) and compute its eigenvalues, i.e., \( \mathbf{X}\mathbf{v}_i =\lambda_i\mathbf{v}_i \), where \( \forall_{i=1, \cdots, M}\lambda_i = \nu_i^2 \). They subsequently categorize 5+1 phases in the training dynamics by modeling the elements of the latter matrices using Heavy-Tailed distributions, i.e., \( W_{ij}\sim P(X)\sim \frac{1}{x^{1+\mu}}, \mu>0 \), whereas the ESD \( \rho_N(\lambda) \) likewise exhibits Heavy-Tailed properties. Excluding the two initial phases and that of over-training (+1), there are 3 phases of interest, categorized by their better generalization, namely:

- **Weakly Heavy-Tailed**: \( 4 < \mu \) with Marchenko-Pastur behavior in the finite limit and Power-Law statistics at the edge.
- **Moderately Heavy-Tailed**: \( 2 < \mu < 4 \) with \( \rho(\lambda)\sim \lambda^{-1-\mu/2} \) at finite size and \( \rho_N(\lambda)\sim \lambda^{-a\mu+b} \) at infinite size, whereas the parameters \( a, b \) are empirically fitted using linear regression. Maximum eigenvalues follow the Frechet distribution.
- **Very Heavy-Tailed**: \( 0 < \mu < 2 \), where the ESD is Heavy-Tailed/PL for all finite \( N \) and converges for \( N\rightarrow\infty \) to a distribution with tails \( \rho(\lambda)\sim \lambda^{-1-\mu/2} \). The maximum eigenvalues again follow a Frechet distribution.

### Significance

The theory of HT-SR has led to interesting results both for the sake of applicability and from a purely theoretical standpoint. The practicality of this work has become apparent due to the development of more efficient training policies, such as temperature balancing [9], as well as real-time metrics like the Frobenius Norm, Spectral Norm, Weighted Alpha, and \( \alpha \)-Norm, which are calculated using HT-SR independently of the training and testing data [10].

On the other hand, the empirical observations have inspired the construction of stronger bounds for the generalization properties of SGD's trajectories via stochastic differential equations (SDEs) under heavy-tailed gradient noise [12]. These bounds have indicated a 'non-monotonic relationship between the generalization error and heavy tails,' and have been developed into a general class of objective functions based on the Wasserstein stability bounds for heavy-tailed SDEs and their discretization [13].

The aforementioned results support the claim that a more detailed study of the ESDs of various open-source models can lead to a refined understanding of the phenomenology and provoke interesting theoretical insights.

It is also important to mention the feasibility of this work, as the empirical component does not require extensive computational resources. Using open-sourced models and their weights, the analysis can be performed on a local machine without significant overhead.

## Objectives & Methodology

The goals of this paper are two-fold:

- To present a theoretical exposition of the 'relatively new branch' from RMT [14], specifically that of heavy-tailed random matrices, by citing the rapidly developing literature [15-18].
- To expand the empirical results of HT-SR by applying refined classification through the use of Maximum Likelihood Expectation (MLE) with respect to a range of heavy-tailed distributions, instead of linear regression for a Power-Law fit. Additionally, the paper aims to examine a wide array of open-source models varying in architecture and underlying symmetries.

### Empirical Study

The methodology proposed follows Martin and Mahoney’s approach [7]—studying the ESD of layer weight matrices of DNNs. Their classification of training dynamics involves 5+1 phases determined by the deviation of Bulk Statistics from the standard Marchenko-Pastur Distribution towards a Heavy-Tailed distribution.

Martin and Mahoney estimate the extent of heavy-tailed behavior through linear regression on the log-log plot of the empirically fitted Power-Law exponent \( \alpha \). While sufficient for their stated "aim for an operational theory that can guide practice for state-of-the-art DNNs, not for idealized models for which one proves theorems" [8], this approach is agnostic to the underlying heavy-tailed distribution and potentially misses valuable information. Studies of heavy tails have noted the unreliability of using linear regression for estimating the scaling parameter \( \alpha \) [19].

To address this issue, we propose using MLE with respect to different heavy-tailed distributions, such as the Pareto, Cauchy, Levy, Weibull, or Frechet distributions. The latter is particularly meaningful given the empirical observations in HT-SR [@martin2019traditional]. This approach aims to refine the classification of underlying distributions by analyzing a broader array of models, such as the 16 open-source symmetry-informed geometric representation models of *Geom3D* [20].

### Theoretical

The purpose of providing a theoretical exposition of heavy-tailed random matrices is, first and foremost, to consolidate what is currently a rich, but largely disconnected body of literature[14-19]. With only a single dedicated chapter in the Oxford Handbook of RMT [14] and not a lot of theoretical surveys, it is hard to put in context the earlier work.

The consequences of this can be seen through a single example, found in a paper, predating the HT-SR theory by more than 11 years. More specifically, observe the following theorem (Theorem 1 [21]) for random matrices with i.i.d heavy tailed entries, i.e $(a_{ij}), 1\leq i\leq n, 1\leq j\leq n$ with $1-F(x)=\bar{F}(x)=\mathbb{P}\left(\left|a_{i j}\right|>x\right)=L(x) x^{-\alpha}$ where $0<\alpha<4$ and $\forall t>0, \lim_{x\rightarrow\infty}\frac{L(tx)}{L(x)}=1$. With the additional assumption of $\mathbb{E}(a_{ij})=0$ for $2\leq \alpha <4$ , the theorem stats that the random point process $\hat{\mathcal{P}}_n = \sum_{i\leq i\leq j\leq n}\delta_{b_n^{-1}|a_{ij}|}$ converges to a Poisson Point process with intensity $\rho(x) =\alpha\cdot x^{-1-\alpha}$. This theoretical result in fact matches precisely the values used to classify one of the phase transitions in HT-SR [7] w.r.t $\alpha$, as well as the power law exponent of the linear regression fit. 

Furthermore, its corollary (Corollary 1 [21]) gives theoretical justification for what the authors of HT-SR [7] observe to be the Frechet distribution fit for the maximum eigenvalues within that same heavy-tailed phase. Not only that, but its Poisson process phenomenology seems to agree with the underlying assumption behind one of the aforementioned theoretical results, namely that SDE trajectories of SGD are well-approximated by a Feller process [12]. This suggests that the latter results are exceptionally interesting due their potential to serve as theoretical grounding for what is currently only an empirical theory. A more rigorous exposition of the material, paired with the aforementioned empirical analysis have the potential to give clarity to what is already being developed as a potential theory for the learning of DNNs. 

## References

1] -  V. Vapnik, E. Levin, and Y. Le Cun. Measuring the VC-dimension of a learning machine. Neural Computation, 6(5):851–876, 1994.
[2] - A. Engel and C. P. L. Van den Broeck. Statistical mechanics of learning. Cambridge University Press, New York, NY, USA, 2001.
[3] - Y. Bahri, J. Kadmon, J. Pennington, S. S. Schoenholz, J. Sohl-Dickstein, and S. Ganguli. Statistical Mechanics of Deep Learning. Annual Review of Condensed Matter Physics 11:501-528, 2020.
[4] - Schoenholz, S. S., Pennington, J., & Sohl-Dickstein, J. A Correspondence Between Random Neural Networks and Statistical Field Theory, 2020
[5] -J. Pennington, and P. Worah. Nonlinear random matrix theory for deep learning. NIPS 2017
[6] - J. Pennington, and Y. Bahri. Geometry of Neural Network Loss Surfaces via Random Matrix Theory. PMLR 70, 2017.
[7] -  C. H. Martin and M. W. Mahoney. Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning. Journal of Machine Learning Research, 22(165):1–73, 2021.
[8] - C. H. Martin and M. W. Mahoney. Traditional and heavy tailed self regularization in neural network models. In International Conference on Machine Learning, 2019.
[9] - Y. Zhou, T. Pang, K. Liu, C. H. Martin, M. W. Mahoney, and Y. Yang. Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training. NIPS 2023
[10] -C. H. Martin, T. S. Peng, and M. W. Mahoney. Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data. Nature Communications, 12(1):1–13, 2021.
[11] - C. H. Martin and M. W. Mahoney. Heavy-tailed universality predicts trends in test accuracies for very large pre-trained deep neural networks. In SIAM International Conference on Data Mining, 2020.
[12] - Şimşekli, U., Sener, O., Deligiannidis, G., & Erdogdu, M. A. Hausdorff dimension, heavy tails, and generalization in neural networks. Journal of Statistical Mechanics 2021(12), 2021.
[13] - Anant Raj, Zhu, L., Gürbüzbalaban, M., & Umut \c{S}imşekli. Algorithmic Stability of Heavy-Tailed SGD with General Loss Functions, 2023
[14] -32.  Z. Burda and J. Jurkiewicz. Heavy-tailed random matrices. The Oxford Handbook of Random Matrix Theory, 2011
[15] - J. Bouchaud, and M. Potters. Financial applications of random matrix theory: a short review. The Oxford Handbook of Random Matrix Theory, 2011
[16] - Edelman, A., Guionnet, A., & Péché, S. .Beyond Universality in Random Matrix Theory. The Annals of Applied Probability 26(3), 2016
[17]  G. B. Arous, and A. Guionnet. The Spectrum of Heavy Tailed Random Matrices. Springer, 2017
[18] - Rebrova, E. Spectral Properties of Heavy-Tailed Random Matrices. ProQuest Dissertations & Theses, 2018
[19] -Nair, J., Wierman, A., & Zwart, B. The fundamentals of heavy-tails: properties, emergence, and identification. Proceedings of the ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems 387–388, 2013
[20] -  S., Du, W., Li, Y., Li, Z., Zheng, Z., Duan, C., Ma, Z., Yaghi, O., Anandkumar, A., Borgs, C., Chayes, J., Guo, H., & Tang, J. (2023). Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials. NIPS 2024, 2023
[21] - A. Auffinger, G. B. Arous, G, and S. Peche. Poisson convergence for the largest eigenvalues of heavy tailed random matrices. Annales de l’I.H.P. Probabilités et Statistiques 45(3), 589–610, 2009

 

 

Edited by Ero

Share this post


Link to post
Share on other sites

In light of my recent technical feedback on Wolfram's Fundamental Theory:

I wanted to share my perspective on why the idea of a 'fundamental theory' or a 'theory of everything' is fundamentally flawed and a doomed from the start. My thoughts have been largely inspired by mathematics and coincide with the ruminations Edward Frenkel has shared with both Curt Jaimungal and the Science and Nonduality folk. I will try to use as many metaphors as possible without bogging the discussion with too much technical detail. 

For the sake of epistemic neatness, let me first specify that when speaking of a theory, I am referring to a model of the world that works through the creation/study of abstract structures and their relationships, i.e the map, not the territory. A theory is fundamentally different than an ontology (the matter/ composition of the terrain) and the two should not be conflated. An ontology gives no 'shape' to the territory, but it 'informs' what structure you can build on top the same way you can't build a castle from sand. 

The underlying premise of a unifying theory is the idea of 'universal terrain' - an underlying structure that can be inferred through the various ways of mapping the same territory - whether it is topography (height map), seismic analysis (depth map) or standard cartography (landscape), the terrain manifests itself differently in each instance, but still posses some invariant qualities across each type of measurement. For example, a desert doesn't have caves or tall mountains and is more spread out compared to a cavernous karst landscape. 

The presiding epistemological assumption in Physics is the existence of such a landscape - a grand unified theory that can be found across all our methods of sampling of reality - electromagnetic, gravitational or nuclear in nature. The underlying belief is that once such a Unified Field theory is found, everything else will fall into place. This, of course, is nothing more than a pipe dream. Contemporary physics itself does not operate under that assumption. The problem with it stems from the fundamentally computationally intractable nature of reality (postulate 1 of 'Chaos'):

If you have heard of the idea of the 'Three-Body Problem' (recommend the book), than you already know that there a fairly simple systems (three body acting through gravitation on each other) for which we can no longer find closed-form solutions. This, in fact, turns out to be the rule and not the exception. All our calculable models are fundamentally idealizations - when studying newtonian mechanics, you always ignore friction. When studying many-body quantum systems, you try and minimize long-distance interactions. When trying to fight a virus, we do not study its quantum composition, only its molecular/cellular structure. The aforementioned intractability provably appears also in fluids, black holes, Ising models, lorenz systems, etc. Now it seems we can't solve anything. Or can we? 

The person who made the most progress on the aforementioned three-body problem is Henri Poincare, a true giant in the field of mathematics. He not only proved that no general closed-form solution exists, he developed entirely new models in topology and symplectic geometry that allowed him to find stable 'homoclinic' orbits - by creating new models of abstraction, he was able to discern properties of the system despite its fundamental intractability. This approach of abstraction is in fact universal (postulate 3 'Order')

A great anecdote Frenkel gave is that when you are playing chess, you don't care about calculating every quantum state of the system and even if you could (we can't by a factor of 10^23), it still wouldn't help you win the game. Combining this with the prior realization about the intractable nature of reality helps us approach a greater understanding and one that has in fact been foundational to the true nature of mathematics - the study of structure and abstraction itself. Instead of seeking a 'unifying theory', we should marvel at the complexity of reality and approach problems at each level (physics, chemistry, biology, computer science etc.) by applying models and ways of thinking from across the the fields without any 'elitism' as to which science is the 'fundamental one'. And yet, here is why 'mathematics is the queen of all sciences' as Gauss said it - because Mathematics does not make itself to be a fundamental theory, as it does not make prescriptive statements about the landscape. Rather, it is a universal language that helps us map and compare all the landscapes there are. Even deeper at its essence, unification in Mathematics is about building the bridges across the continents. 

Share this post


Link to post
Share on other sites

So, as I was thinkering with how can I add specific access policy to the host file in order to help me monitor website usage and access. Turns out, you can't really do that - UNIX-based systems do not allow fine-grained access control from the get-go, which baffles me. That should be the fundamental premise of any operating system going forward. You can no longer treat the user experience as the primary interface due to the growing nature of hybrid systems. One can imagine the absolute necessity in containing future models, crafting dedicated access policies for data and monitoring, s.t. containment is baked in as part of the architecture. 

For now a nixOS + SELinux (but not really since both are UNIX-based) is the basic premise I have in mind (SENix?). Probably I won't be able to work on this until after April 2025, but it is definitely something that I should consider as part of the organizational and infrastructural innovations that are necessary for a truly revolutionary startup.

Edited by Ero

Share this post


Link to post
Share on other sites

A very interesting article from Quanta referring to this general trend of considering space-time as potentially an emergent phenomenon:

https://www.quantamagazine.org/the-unraveling-of-space-time-20240925/

The language of Operator Algebras (attir. von Neumann) used to describe such emergent properties is precisely in the field that I am currently studying, namely Functional Analysis. As noted in the first post, the underlying technical basis of the paradigm is the categorification of these operads w.r.t to the topological spaces on which they function. This quanta articles elaborates on how this 'long-forgotten math could decode" nature If it were a hologram (i.e emergent):

https://www.quantamagazine.org/the-unraveling-of-space-time-20240925/

A key quote I want to point out from the article above: Von Neumann and a collaborator, Francis Murray, eventually identified three types of operator algebras. Each one applies to a different kind of physical system. The systems are classified by two physical quantities: entanglement and a property called entropy.

Needless to say, all of this is just further verification of what I have been describing so far. It feels synchronistic. As if all of these are hints/ breadcrumbs. Hard to describe. But there hasn't been a point in my life that I have felt a stronger sense of purpose. Only time will tell what comes out of it. 

Edited by Ero

Share this post


Link to post
Share on other sites

The three types of operator algebras are the following: 

'Type I algebras are the simplest. They describe systems with a finite number of parts, which can be completely disentangled from the rest of the universe.'

'Type II algebras are trickier. They describe systems that have an infinite number of parts, all inextricably entangled with the outside. Absolute entropy is infinite'

Type III [dubbed Alien Algebra], type III "is the worst: It describes a system with infinite parts, infinite entanglement with the outside, and no uniform pattern in the entanglement to help you get oriented. Not even changes in entropy are knowable'

Share this post


Link to post
Share on other sites

Digging down the rabbit hole of the von Neumann algebras, I discovered the work of Alain Connes, the Fields medalist who set the rigorous foundations for the Type III von Neumann algebras. He is the architect of the field of noncommutative geometry, building on top of John von Neumann's work by reframing Riemannian geometry through a spectral triple of a Hilbert space H, the algebra of operators A on it and the unbounded operator D. The following are my notes from his expository papers noncommutativity and physics and noncommutative geometry: the spectral point:

- The encoding by noncommutative algebra retains more information than its commutative counterpart - similar to how a word requires an exact ordering of symbols instead of an Anagram. 

- Noncommutative spaces are dynamical, and possess a canonical time evolution, or as Alain puts it, "time emerges from noncommutativity". They can in fact be considered as thermodynamical object (possessing entropy due to this precise temporal directionality)

- The Riemannian geometric paradigm is extended to the noncommutative world in an operator theoretic and spectral manner. This means that the entirety of current Differential geometry (the basis of all physics phase spaces) can be embedded within this larger paradigm (in line with the Hamiltonian/Langevin formalization direction)

- A real variable is just a self-adjoint operator in a Hilbert space - this is what will in fact allow for the categorification of any random/probabilistic system and is in fact already being done using the exact tools Alain used (Tomita–Takesaki theory) as part of this lager categorification of probability theory

This is extremely encouraging. The amount of endorphins in my brain from seeing all the connections is truly unparalleled. 

Share this post


Link to post
Share on other sites

Covered impressive ground for my thesis. Discovered the following connection between the spectral behavior of heavy-tailed matrices and outstanding open conjectures in the field of quantum chaos/information theory:

-Berry's conjecture  (Berry 1977)/ Quantum Unique Ergodicity (QUE) conjecture - state that as one moves to higher energies (the semiclassical limit) in a quantum system, the eigenfunctions tend to delocalize and behave statistically like Gaussian random fields, corresponding to random superpositions of plane waves.

The transition from delocalized to localized behavior (Anderson Localization) occurring past the mobility edge in Heavy-Tailed Matrices (Aggarwal 2022) finds direct parallels with Berry's conjecture - the delocalized region (the bulk in the spectrum) is described by the same type of Winger-Dyson statistics, applied to the high energy limit in Quantum Systems. 

Her comes the most interesting part: in contrast to the light-tailed (Gaussian) case for random matrices where QUE applies, the heavy-tailed case seems to go in the opposite direction of what is predicted by QUE, potentially pointing to a counter-example that would disprove the latter conjecture by constructing a quantum system with heavy-tailed behavior.

This is a very early and very informal inkling that I have, but it exactly aligns with the larger recontextualization of  'chaos' and 'entropy' - whilst the standard scientific intuition for what those mean (going from order -> disorder) applies to equilibrium systems, exactly the opposite seems to occur for non-equilibrium systems (our planet, every life form and organism are out-of-equilibrium systems).

One for the most recent Kurzgesagt videos about the total biomass of the Deep Earth biome exceeding that of the surface earth organisms by multiples seems to be another nail in the coffin of the idea that life is entropically-improbable. Turns out, exactly the opposite is true (Earth's crust is more entropic by a factor of base 10)

 

Share this post


Link to post
Share on other sites

Without explicitly trying to shit on Stephen Wolfram, I can't help it, so I will only say this - the 'hypergraphs' that he claims to have established as the basis for his 'fundamental theory' are in fact only a special case of a far more general and widely applicable structure - namely that of semi-simplical complexes. His 'hypergraph' is a semi-simplical complex of dimension 2, whereas one can generalize this for any dimension (including infinite) and actually make rigorous statements about them and their emergent properties, which is not something he has ever done. 

Why I am writing this? Because as it turns out, when one defines a random walk on a graph (i.e you start from a random vertex and with equal probability traverse any of the inbound edges), you can in fact infer many of its "topological and spectral properties, such as connectedness, bipartitness and spectral gap magnitude" ([ref]). The latter reference in fact generalizes this to any arbitrary simplical complex, allowing to represent much higher-order interactions.

Why do we care? Because as it turns out, those structures sit at the basis of modern computer science [ref], as Avi Wigderson, an Abel- and Turing-prize winner, states. He has been at the forefront at the field of computational complexity and is someone who has actually revolutionized the field of Theory of Computation and Complexity, proving the equivalence between deterministic and randomized Turing machines (for which he won the Turing prize).

TLDR:  If you are interested about Complexity and Theory of Computation, study Avi Widgerson, not the crackpot Stephen Wolfram.

Share this post


Link to post
Share on other sites

'Intelligence at the edge of Chaos' - Paper from Yale published 3 days ago demonstrating the behavior of generalization and model capacity as closely tied to the underlying 'chaos', i.e complexity of the training data. 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now