Notes on Hugo Cisneros

Notes on Hugo Cisneroshttps://hugocisneros.com/notes/Recent content in Notes on Hugo CisnerosHugo -- gohugo.ioen-usTue, 29 Oct 2024 08:26:00 +0100Giffen goodshttps://hugocisneros.com/notes/giffen_goods/Tue, 29 Oct 2024 08:26:00 +0100https://hugocisneros.com/notes/giffen_goods/ tags EconomicsQuantizationhttps://hugocisneros.com/notes/quantization/Mon, 01 May 2023 08:18:00 +0200https://hugocisneros.com/notes/quantization/tags Computer science, Neural networks The goal of quantization in neural network training is to make neural networks more efficient by simplifying their computations. This is done by replacing floating point operations by operations on smaller number types (quantization of the parameters). The goal of quantization is to preserve the accuracy of the model while doing this conversion. Quantization of large language models The LLM.int8() paper (Dettmers et al. 2022) explains some interesting issues and solutions for quantization of transformer-based large language models.Universal basic incomehttps://hugocisneros.com/notes/universal_basic_income/Mon, 10 Apr 2023 22:31:00 +0200https://hugocisneros.com/notes/universal_basic_income/tags Economics, Economic liberalism Definition Haagh defines UBI as the desire to ‘give all residents a modest regular income grant that is not dependent on means-tests or work-requirements’ (Haagh 2019). Some critics of UBI In (Harris 2023), the author argues that rather than disrupting capitalism, UBI implementations risk reinforcing the modalities of thought that neoliberalism uses to govern. Bibliography Louise Haagh. 2019. The Case for Universal Basic Income. The Case for.Automationhttps://hugocisneros.com/notes/automation/Mon, 27 Mar 2023 08:05:00 +0200https://hugocisneros.com/notes/automation/ tags Economics, Artificial Intelligence Labor and automation resources Interview with Juan Sebastian Carbonell, Interview with Aaron BenanavGopherhttps://hugocisneros.com/notes/gopher/Wed, 22 Feb 2023 13:28:00 +0100https://hugocisneros.com/notes/gopher/ tags Transformers, GPT paper (Rae et al. 2022) Architecture This model is very similar to GPT-2 but uses RSNorm instead of LayerNorm and relative positional encoding rather than absolute positional encoding. Parameter count 280B Bibliography Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, et al.. January 21, 2022. "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". arXiv. DOI.Online privacyhttps://hugocisneros.com/notes/online_privacy/Wed, 22 Feb 2023 13:28:00 +0100https://hugocisneros.com/notes/online_privacy/tags Privacy IP Addresses In (Mishra et al. 2020), the authors analyze a set of users’ internet traffic for more than 100 days. They observed a little more than 11% of the 34,488 IP addresses they collected were present for more than a month. Many of them were reused throughout the whole experience, making long-term tracking of users possible. The study also shows that 93% of users had a unique fixed set of IP addresses during the whole experiment, making it easy to track them between home, work, etc.Sparrowhttps://hugocisneros.com/notes/sparrow/Wed, 22 Feb 2023 13:28:00 +0100https://hugocisneros.com/notes/sparrow/tags Transformers, GPT, Chinchilla paper (Glaese et al. 2022) blog post Deepmind announcement blog post Architecture Starts from the Chinchilla 70B model but adds RLHF (Reinforcement Learning with Human Feedback). It also adds inline evidence like GopherCite. Parameter count 70B Bibliography Amelia Glaese, Nat McAleese, Maja Trębacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Maribeth Rauh, et al.. September 28, 2022. "Improving Alignment of Dialogue Agents via Targeted Human Judgements". arXiv.Surveillancehttps://hugocisneros.com/notes/surveillance/Wed, 22 Feb 2023 13:28:00 +0100https://hugocisneros.com/notes/surveillance/ tags Privacy, Society CCTV Surveillance and crime prevention (Piza et al. 2019) Bibliography Eric L. Piza, Brandon C. Welsh, David P. Farrington, Amanda L. Thomas. February 2019. "CCTV Surveillance for Crime Prevention: A 40‐year Systematic Review with Meta‐analysis". Criminology & Public Policy 18 (1):135–59. DOI.Sshhttps://hugocisneros.com/notes/ssh/Tue, 21 Feb 2023 16:52:00 +0100https://hugocisneros.com/notes/ssh/tags Cryptography SSH random art algorithm It is described in this paper by Dirk Loss, Tobias Limmer, and Alexander von Gernler.Privacyhttps://hugocisneros.com/notes/privacy/Tue, 21 Feb 2023 15:48:00 +0100https://hugocisneros.com/notes/privacy/ tags SocietyLayerNormhttps://hugocisneros.com/notes/layernorm/Tue, 21 Feb 2023 15:44:00 +0100https://hugocisneros.com/notes/layernorm/tags Neural networks paper (Ba et al. 2016) Definition Layer Normalization is a technique used in deep learning to normalize the inputs to a layer in a neural network. In batch normalization, the mean and variance of each batch of inputs to a layer are used to normalize the inputs. In layer normalization, the mean and variance of all the features in a layer (i.e., all the inputs for a given instance) are used to normalize the inputs.Batch normalizationhttps://hugocisneros.com/notes/batch_normalization/Tue, 21 Feb 2023 14:47:00 +0100https://hugocisneros.com/notes/batch_normalization/ tags Neural networksChatGPThttps://hugocisneros.com/notes/chatgpt/Mon, 13 Feb 2023 13:26:00 +0100https://hugocisneros.com/notes/chatgpt/tags GPT, Transformers, NLP blog post OpenAI blog post Architecture ChatGPT takes a GPT3.5 (aka GPT3 Davinci-003) pretrained model and uses RLHF to fine-tune the model similarly to InstructGPT but with some differences in the data collection. It is also more than “just” a model since it includes extensions for Memory Store and retrieval similar to BlenderBot 3. Parameter count 175BReinforcement learning with human feedbackhttps://hugocisneros.com/notes/reinforcement_learning_with_human_feedback/Mon, 13 Feb 2023 13:23:00 +0100https://hugocisneros.com/notes/reinforcement_learning_with_human_feedback/ tags Reinforcement learning, NLPBlenderBot 3https://hugocisneros.com/notes/blenderbot_3/Mon, 13 Feb 2023 13:18:00 +0100https://hugocisneros.com/notes/blenderbot_3/tags Transformers, GPT, OPT: Open Pre-trained Transformer, NLP blog post Meta AI announcement blog post paper (Shuster et al. 2022) Architecture It is based on a pre-trained OPT model, with some optimizations to make it better as a dialog agent, such as long term memory and the ability to search the web. It uses human feedback to fine-tune its results on some tasks. Parameter count 175B Bibliography Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, et al.Leniahttps://hugocisneros.com/notes/lenia/Thu, 02 Feb 2023 18:23:00 +0100https://hugocisneros.com/notes/lenia/tags Cellular automata papers (Chan 2019; Chan 2020) Lenia is a continuous cellular automaton initially developed by Bert Chan. It is sometimes referred to as a “continuous Conway’s Game of Life”. Definition Lenia is defined by a PDE that describes the evolution of the scalar field $\mathbf{A}$ given a convolution kernel $\mathbf{K}$ and a growth mapping $G$: \begin{equation} \mathbf{A}^{t+\Delta t} = \left[ \mathbf{A}^{t} + \Delta t \;G \big(\mathbf{K}*\mathbf{A}^t\big) \right]_0^1 \end{equation}Neural network traininghttps://hugocisneros.com/notes/neural_network_training/Thu, 02 Feb 2023 18:19:00 +0100https://hugocisneros.com/notes/neural_network_training/tags Neural networks, Machine learning, Optimization A common algorithm for neural network training is backpropagation. Neural network training as development in program space A neural network as a whole can be seen as a dynamical system. Its state is the collection of its parameters, and its evolution function is the optimization step taken when training the network. A neural network has parameters $\theta_t$ at time $t$ which can be seen as its state.Neural architecture searchhttps://hugocisneros.com/notes/neural_architecture_search/Thu, 02 Feb 2023 18:17:00 +0100https://hugocisneros.com/notes/neural_architecture_search/tags Search, Neural networks Neural architecture search (NAS) is a method for finding neural networks architectures. It is usually based on three main components: Search space Type of network that can be built. Search strategy The approach for exploring the space. Performance estimation strategy The way the performance of a constructed neural network is evaluated (without actually building it or training/running it). Reinforcement learning-based NAS The original idea was called Neural architecture search and is based on the use of a RNN as a controller and generator of architectures.Backpropagationhttps://hugocisneros.com/notes/backpropagation/Thu, 02 Feb 2023 18:16:00 +0100https://hugocisneros.com/notes/backpropagation/ tags Algorithm, Neural networksNotes on: Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks by Voelker, A., Kajić, I., & Eliasmith, C. (2019)https://hugocisneros.com/notes/voelkerlegendrememoryunits2019/Thu, 02 Feb 2023 16:38:00 +0100https://hugocisneros.com/notes/voelkerlegendrememoryunits2019/tags Recurrent neural networks source (Voelker et al. 2019) Summary This paper introduces the LMU recurrent cell. This cell is based on a similar-ish idea to LSTM to maintain a memory hidden state. The main idea of the paper is to make this memory satisfy a set of first order ordinary differential equations. \begin{equation} \theta \dot{m}(t) = Am(t) + Bu(t) \end{equation} This system has a solution which represents sliding windows of $u$ via Legendre polynomials.Energyhttps://hugocisneros.com/notes/energy/Fri, 27 Jan 2023 11:03:00 +0100https://hugocisneros.com/notes/energy/ tags Climate Renewable energy Non-renewable energyCarbon emissionshttps://hugocisneros.com/notes/carbon_emissions/Tue, 24 Jan 2023 16:28:00 +0100https://hugocisneros.com/notes/carbon_emissions/tags Climate Scopes of carbon emissions Usually, greenhouse gas emissions are divided into multiple categories that correspond to different scopes of involvement in the emissions. An entity is responsible for all these emissions, but it may have different levels of control on the various scopes. Scope 1: Direct emissions This is the CO2 or other greenhouse gas (GHG) the organization emits directly. Examples: Direct fossil fuel burning, methane leak, etc.Carbon capturehttps://hugocisneros.com/notes/carbon_capture/Tue, 24 Jan 2023 16:20:00 +0100https://hugocisneros.com/notes/carbon_capture/tags Climate, Carbon emissions Carbon capture consists in using devices to re-capture and store some the carbon emissions emitted in the past or that will be emitted in the near future. Carbon capture will be too expensive for too little results ‘Carbon Capture’ Is No Fix. Big Oil’s Known for Decades — The Tyee, 7 July 2022Carbon offsettinghttps://hugocisneros.com/notes/carbon_offsetting/Tue, 24 Jan 2023 16:18:00 +0100https://hugocisneros.com/notes/carbon_offsetting/ tags Climate, Carbon emissions The Carbon Con, an article from Source Material BibliographyCarbon taxhttps://hugocisneros.com/notes/carbon_tax/Tue, 24 Jan 2023 16:18:00 +0100https://hugocisneros.com/notes/carbon_tax/tags Climate, Economic liberalism, Taxation The idea behind the carbon tax is to tax carbon emissions from private entities, public institutions or individual so as to create an economic incentive for emitting less carbon dioxyde. Carbon tax and redistribution Perception and public opinion Slides by Mathild Mus (in french)Climatehttps://hugocisneros.com/notes/climate/Mon, 16 Jan 2023 21:10:00 +0100https://hugocisneros.com/notes/climate/tags Complex Systems International environmental agreements From (Pouw et al. 2022): First, at the international level, universal coalitions are more cost-efficient and effective than fragmented regimes, but more difficult to negotiate and less stable. Second, in developing countries, there is need for substantial external funding to cover the short-run costs of environmental compliance. Third, market-based solutions have been increasingly applied in international agreements but with mixed results. Climate policies Acceptability of climate policies From (Dechezleprêtre et al.Gig economyhttps://hugocisneros.com/notes/gig_economy/Wed, 11 Jan 2023 15:50:00 +0100https://hugocisneros.com/notes/gig_economy/ tags Economics, Economic liberalism Flexible work and exploitation (Chung 2022) Gig economy and unions (Gray 2022) Bibliography Heejung Chung. 2022. The Flexibility Paradox: Why Flexible Working Leads to (self-)exploitation. Bristol: Polity Press. Paul Christopher Gray. 2022. ""the Same Tools Work Everywhere": Organizing Gig Workers with Foodsters United". Labour / Le Travail 90 (1). The Canadian Committee on Labour History:41–84. https://muse.jhu.edu/pub/151/article/870057.ALife 2020https://hugocisneros.com/notes/alife_2020/Wed, 11 Jan 2023 09:48:00 +0100https://hugocisneros.com/notes/alife_2020/tags ALife Conference, Artificial life Day 1 Tutorial - Functional programming for artificial life Tutorial - Visualization Principles and Techniques for Research in ALife Mike Levin - Keynote Lecture Day 2 Sara Walker, keynote Lecture About what life means and how it can be defined from the point of view of physics/information theory, etc. Melanie Mitchell, keynote Lecture This talk was very similar to another one I watched from Santa Fe Institute which promotes her book: Artificial Intelligence: A Guide for Thinking Humans.Transformershttps://hugocisneros.com/notes/transformers/Thu, 05 Jan 2023 14:13:00 +0100https://hugocisneros.com/notes/transformers/tags Neural networks resources Transformer catalog, The illustrated transformer Transformers are a neural network architecture based on a mechanism called Attention. They have been particularly successful for NLP applications which started around the publication of a very influential paper by Vaswani and colleagues (Vaswani et al. 2017). Transformers turned out to be very effective language models. They also penetrated other fields of machine learning such as Computer vision or Reinforcement learning.Radon transformhttps://hugocisneros.com/notes/radon_transform/Mon, 02 Jan 2023 11:36:00 +0100https://hugocisneros.com/notes/radon_transform/ tags Signal processing From this tweet: The marginals $fX$ and $fY$ of a joint distribution $f(x, y)$ can be seen as the Radon transform of $f(x,y)$ in the $θ=0$ and $θ=π/2$ directions. Similarly, a joint distribution can be thought of as an optimal transport solution to an undersampled tomography result. BibliographyGenerative arthttps://hugocisneros.com/notes/generative_art/Wed, 28 Dec 2022 13:44:00 +0100https://hugocisneros.com/notes/generative_art/ tags Art, AlgorithmCreative codinghttps://hugocisneros.com/notes/creative_coding/Wed, 28 Dec 2022 13:43:00 +0100https://hugocisneros.com/notes/creative_coding/ tags Coding, ArtNotes on: Efficient Neural Architecture Search via Parameter Sharing by Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., & Dean, J. (2018)https://hugocisneros.com/notes/phamefficientneuralarchitecture2018/Fri, 23 Dec 2022 17:40:00 +0100https://hugocisneros.com/notes/phamefficientneuralarchitecture2018/tags Neural architecture search source (Pham et al. 2018) Summary Like other papers, the controller is a RNN that generates each part of the architecture in sequence. The main contribution of this paper is to introduce parameter sharing in child models. For, this, it represents all possible architectures in a single DAG of operations and share weights between same operations. They explain how to design a RNN cell with their model, a convolutional network (and convolutional cell to build a CNN) and how to train.Arthttps://hugocisneros.com/notes/art/Thu, 22 Dec 2022 11:19:00 +0100https://hugocisneros.com/notes/art/Art with Cellular Automatahttps://hugocisneros.com/notes/art_with_cellular_automata/Thu, 22 Dec 2022 11:19:00 +0100https://hugocisneros.com/notes/art_with_cellular_automata/tags Art Cellular automata have been used a lot to create various forms of Generative art. Here is a collection of some interesting examples: Examples based on Neural CA Self organizing textures: (Niklasson et al. 2021) Dialogue: an art project using interacting Neural CA to generate patterns Revisiting classical CA Crosshatch CA CA Music Wolfram tones Bibliography Eyvind Niklasson, Alexander Mordvintsev, Ettore Randazzo, Michael Levin. February 11, 2021. "Self-organising Textures"Machine learninghttps://hugocisneros.com/notes/machine_learning/Wed, 21 Dec 2022 09:16:00 +0100https://hugocisneros.com/notes/machine_learning/tags Artificial Intelligence, Applied maths Machine learning is about constructing algorithms that can approximate complex functions from observations of input/output pairs. Machine learning is related to Statistics since its goal is to make predictions based on data. Regression The goal is to approximate a target function $f$ or signal $S$. The output space is often continuous. Classification The goal is to approximate a target function that assigns label to input points.Huffman codinghttps://hugocisneros.com/notes/huffman_coding/Tue, 20 Dec 2022 16:59:00 +0100https://hugocisneros.com/notes/huffman_coding/tags Compression, Entropy coding Python implementation from heapq import heappush, heappop, heapify def huffman_coding(frequency_dict): # Create a heap of tuples (frequency, character, code) heap = [[frequency, [character, ""]] for character, frequency in frequency_dict.items()] heapify(heap) while len(heap) > 1: # Extract the two nodes with the lowest frequencies left, right = heappop(heap), heappop(heap) # Assign a "0" to the left child and a "1" to the right child for pair in left[1:]: pair[1] = "0" + pair[1] for pair in right[1:]: pair[1] = "1" + pair[1] # Merge the two nodes and add the resulting node back to the heap heappush(heap, [left[0] + right[0]] + left[1:] + right[1:]) # Extract the coding dictionary from the heap return dict(sorted(heappop(heap)[1:], key=lambda p: (len(p[-1]), p))) # Example usage frequency_dict = {'a': 45, 'b': 13, 'c': 12, 'd': 16, 'e': 9, 'f': 5, 'r': 2, 'q': 1} print(huffman_coding(frequency_dict)) The code above outputs the following Python dictionary:Algorithmic biashttps://hugocisneros.com/notes/algorithmic_bias/Tue, 20 Dec 2022 16:25:00 +0100https://hugocisneros.com/notes/algorithmic_bias/ tags Machine learningImage classificationhttps://hugocisneros.com/notes/image_classification/Tue, 20 Dec 2022 16:12:00 +0100https://hugocisneros.com/notes/image_classification/tags Computer vision Image classification is a machine learning associated with Computer vision. Its goal is to assign a class to a particular image. When this class corresponds to an object depicted in the image, the task is called Object recognition. In general, image classification can also be applied to many other areas. For example, in medical imaging, one may want to design an algorithm that can classify Convolutional neural networks have been successful at many image classification tasks, and are starting to be overtaken by transformers for some applications.Object recognitionhttps://hugocisneros.com/notes/object_recognition/Tue, 20 Dec 2022 16:09:00 +0100https://hugocisneros.com/notes/object_recognition/ tags Computer visionResource cursehttps://hugocisneros.com/notes/resource_curse/Tue, 20 Dec 2022 15:08:00 +0100https://hugocisneros.com/notes/resource_curse/tags Economics Definition The resource curse is a phenomenon in which countries with an abundance of natural resources, such as oil, minerals, and other raw materials, often end up with slower economic growth and less democratic governments than countries without such resources. This may be because the wealth generated by the exploitation of these resources is not distributed evenly and can lead to corruption, conflict, and social unrest. The resource curse is also referred to as the “paradox of plenty”.Dutch diseasehttps://hugocisneros.com/notes/dutch_disease/Tue, 20 Dec 2022 14:54:00 +0100https://hugocisneros.com/notes/dutch_disease/tags Economics resources Wikipedia Definition From (Corden 1984): The term Dutch Disease refers to the adverse effects on Dutch manufacturing of the natural gas discoveries of the nineteen sixties, essentially through the subsequent appreciation of the Dutch real exchange rate [footnote]. [footnote] The first printed reference to the term I have found is in the article “The Dutch Disease” in The Economist November 26th 1977, pp. 82-3. After the discovery of this large natural gas field in 1959, the Dutch currency became stronger compared to other nations thanks to increased exports.Reinforcement learninghttps://hugocisneros.com/notes/reinforcement_learning/Wed, 14 Dec 2022 20:41:00 +0100https://hugocisneros.com/notes/reinforcement_learning/tags Machine learning In reinforcement learning, agents take actions within an environment. Usually, both the agent and environment states change in reaction to this action. A reward is given to the agent to tell it if the action was positive or negative. The goal of a learning agent is to act so as to maximize that reward. An agent can be anything from a fixed set of if-else statements to a deep neural network.Batterieshttps://hugocisneros.com/notes/batteries/Thu, 24 Nov 2022 21:07:00 +0100https://hugocisneros.com/notes/batteries/tags Climate, Energy Gigafactories The word gigafactory was coined by Tesla, who started building its first massive factory in 2014 in Nevada. The word corresponds to the scale of the output of these factories which is on the order of GWh of total battery capacity constructed per year. Now the word is used for all battery factories. According to a study by Ultimate Media and ABB, the global battery production was 450 GWh in 2020.Fourier transformhttps://hugocisneros.com/notes/fourier_transform/Wed, 16 Nov 2022 09:23:00 +0100https://hugocisneros.com/notes/fourier_transform/tags Mathematics, Signal processing Gibbs phenomenon The Gibbs phenomenon appears when a discontinuous function is approximated with Fourier coefficients. The result is an overshoot at the discontinuities that does not decrease as more terms are added to the approximation.L-Systemshttps://hugocisneros.com/notes/l_systems/Thu, 20 Oct 2022 15:11:00 +0200https://hugocisneros.com/notes/l_systems/tags Complex Systems L-systems are string re-writing systems. They operate on an alphabet of symbols, rewriting symbols or patterns of symbols into new patterns according to a set of rules. Examples Fractal tree A simple binary tree can be constructed with a L-system, using the rule (1 → 11), (0 → 1[0]0) and starting from a single 0. This could be then drawn by a turtle drawer, where :Einopshttps://hugocisneros.com/notes/einops/Wed, 19 Oct 2022 15:10:00 +0200https://hugocisneros.com/notes/einops/ tags Coding Einops is a array manipulation paradigm that uses string description of the array operations to make complex manipulations (such as summing, broadcasting and reshaping on specific axes only) easier. Examples Max pooling import einops as ein patches = ein.rearrange(img, '(h i) (w j) c -> h w i j c', i=20, j=20) max_pool = ein.reduce(patches, 'h w i j c -> h w c', 'max')Diffusion modelshttps://hugocisneros.com/notes/diffusion_models/Wed, 19 Oct 2022 15:07:00 +0200https://hugocisneros.com/notes/diffusion_models/tags Generative modelling papers (Sohl-Dickstein et al. 2015), (Ho et al. 2020) Principle of diffusion Forward diffusion An image of size $N$ by $N$ $x_0$, which is a vector in $\mathbb{R}^{N \times N \times c}$ is diffused at each timestep $t$ to become $x_t$. The forward diffusion step is defined as follows: \[ q(\boldsymbol{x}_t | \boldsymbol{x}_{t-1}) = \mathcal{N}(\boldsymbol{x}_t; \sqrt{1 - \beta_t} \boldsymbol{x}_{ t - 1 }, \beta_t I) \] The probability of a sequence of images $x_1, \ldots, x_T$ is then \[ q(\boldsymbol{x}_1, \ldots, \boldsymbol{x}_T | \boldsymbol{x}_0) = \prod_{t=1}^T q(\boldsymbol{x}_t|\boldsymbol{x}_{t -1}) \]Frank-Wolfe algorithmhttps://hugocisneros.com/notes/frank_wolfe_algorithm/Thu, 22 Sep 2022 12:44:00 +0200https://hugocisneros.com/notes/frank_wolfe_algorithm/tags Optimization, Algorithm resources Fabian Pedregosa’s series on FW Definition It was originally published in (Frank, Wolfe 1956) and (Jaggi 2013) gives a more recent overview. For a function $f$ differentiable with $L$-Lipschitz gradients, and its domain $\mathcal{C}$ is a convex and compact set, we want to solve the optimization problem: \[ \min_{\boldsymbol{x} \in \mathcal{C}} f(\boldsymbol{x}) \] The algorithm starts with an initial guess $\boldsymbol{x}_0$ and constructs a sequence of values $\boldsymbol{x}_1, \boldsymbol{x}_2, \cdots$ which converges to the solution.Notes on: Git Re-Basin: Merging Models modulo Permutation Symmetries by Ainsworth, S. K., Hayase, J., & Srinivasa, S. (2022)https://hugocisneros.com/notes/ainsworthgitrebasinmerging2022/Mon, 19 Sep 2022 11:09:00 +0200https://hugocisneros.com/notes/ainsworthgitrebasinmerging2022/ source (Ainsworth et al. 2022) tags Neural networks Summary This paper introduces various methods for matching and interpolating the weights of multiple neural networks of the same architecture trained from different starting points or data. These neural networks have different weight values after the training. Comments Bibliography Samuel K. Ainsworth, Jonathan Hayase, Siddhartha Srinivasa. September 11, 2022. "Git Re-basin: Merging Models Modulo Permutation Symmetries". arXiv. DOI.Complex Systemshttps://hugocisneros.com/notes/complex_systems/Wed, 07 Sep 2022 10:22:00 +0200https://hugocisneros.com/notes/complex_systems/tags Physics Definition By a complex system I mean one made up of a large number of parts that interact in a nonsimple way. — Herbert Simon, 1962 Bottomless wonders spring from simple rules, which are repeated without end. — Mandelbrot, ~1980 When we talk about complex systems in time, we often used the term complex dynamical systems. Examples of complex systems The economy (Anderson 1996) Boolean networks Cellular automata Neural networks Many physical systems Understanding complex systems There is an interesting series of articles that are about how different scientific disciplines approach the same problem of understanding an incredibly complex system that we initially don’t know anything about:Neoliberalismhttps://hugocisneros.com/notes/neoliberalism/Tue, 06 Sep 2022 08:35:00 +0200https://hugocisneros.com/notes/neoliberalism/tags Economics, Economic liberalism Definition In (Hay 2004), the author gives the following definition of neoliberalism; Economic neoliberalism, I suggest, can be defined in terms of the following traits: A confidence in the market as an efficient mechanism for the allocation of scarce resources. A belief in the desirability of a global regime of free trade and free capital mobility. A belief in the desirability, all things being equal, of a limited and non-interventionist role for the state and of the state as a facilitator and custodian rather than a substitute for market mechanisms.Gradient descent for wide two-layer neural networks – I : Global convergencehttps://hugocisneros.com/notes/gradient_descent_for_wide_two_layer_neural_networks_i_global_convergence/Thu, 01 Sep 2022 08:46:00 +0200https://hugocisneros.com/notes/gradient_descent_for_wide_two_layer_neural_networks_i_global_convergence/tags Neural networks, Optimization authors Francis Bach, Lénaïc Chizat source Francis Bach’s blog In the rest, we use the mathematical definition of a neural network from Neural networks. Two layer neural network Even simple neural network models are very difficult to analyze. This is primarily due to two difficulties: Non-linearity: the problem is typically non-convex, which in general is a bad thing in optimization. Overparametrization: there are often a lot of parameters, sometimes many more parameters than observations.Linear programminghttps://hugocisneros.com/notes/linear_programming/Tue, 30 Aug 2022 21:26:00 +0200https://hugocisneros.com/notes/linear_programming/tags Optimization Linear programs are problems that can be expressed as \begin{align} & \text{Find a vector} && \mathbf{x} \\ & \text{that maximizes} && \mathbf{c}^T \mathbf{x}\\ & \text{subject to} && A \mathbf{x} \leq \mathbf{b} \\ & \text{and} && \mathbf{x} \ge \mathbf{0}. \end{align}Hopfield Networkshttps://hugocisneros.com/notes/hopfield_networks/Tue, 30 Aug 2022 21:25:00 +0200https://hugocisneros.com/notes/hopfield_networks/tags Neural networks Hopfield networks are a kind of recurrent neural network with binary threshold nodes. Definition Nodes have indexes $i \in \{1, \cdots, n\}$ and are in state $s_i \in \{-1, 1\}$. Nodes have connections between them, characterized by a weight $w_{ij}$. Each node also has an associated threshold $\theta_i$ such that \[ s_i \leftarrow \begin{cases} +1 & \text{if}\ \sum_j w_{ij} s_j \geq \theta_i, \newline -1 & \text{otherwise}. \end{cases} \]Talk: Alife 2020 keynote Sara Walker - The Natural History of Informationhttps://hugocisneros.com/notes/talk_alife_2020_keynote_sara_walker_the_natural_history_of_information/Tue, 30 Aug 2022 21:18:00 +0200https://hugocisneros.com/notes/talk_alife_2020_keynote_sara_walker_the_natural_history_of_information/tags Life The problem of defining life Definitions of life have always been elusive. Life does not exist. — Andrew Ellington (American Chemical Society 2012). as one focuses experimentally on any of the ‘defining’ properties of ’life’, the sharp boundary seems to blur, splitting into finer and finer sub-divisions — Jack Szostak (J. Biomolecular Struc. Dyn. 29.4 (2012) : 599-600.) When looking at matter down to the chemical level, it’s hard to tell what is fundamentally different between living and non-living matter.Cellular automatahttps://hugocisneros.com/notes/cellular_automata/Tue, 30 Aug 2022 21:15:00 +0200https://hugocisneros.com/notes/cellular_automata/tags Emergence, Chaos, Artificial Intelligence resources Wikipedia, (Von Neumann, Burks 1966; Wolfram 2002) Definition A cellular automaton is a computational model defined with respect to a regular grid of individual elements (called cells). Each of those cells can be in one of a finite number of states — alive or dead, $\{1, 2, 3\}$, etc. A cellular automaton’s evolution is simulated in discrete timesteps. At each new timestep, cells are updated according to a local evolution rule.Financehttps://hugocisneros.com/notes/finance/Tue, 30 Aug 2022 19:01:00 +0200https://hugocisneros.com/notes/finance/ tags EconomicsContinual learninghttps://hugocisneros.com/notes/continual_learning/Tue, 30 Aug 2022 16:30:00 +0200https://hugocisneros.com/notes/continual_learning/tags Machine learning Continual learning is a type of supervised learning where there is no “testing phase” associated to a decision process. Instead, training samples keep being processed by the algorithm which has to simultaneously make predictions and keep learning. This is challenging for a fixed neural network architecture since it has a fixed capacity and is bound to either forget things or be unable to learn anything new.Futures contractshttps://hugocisneros.com/notes/futures_contracts/Tue, 30 Aug 2022 16:29:00 +0200https://hugocisneros.com/notes/futures_contracts/ tags Finance, EconomicsUnker non-linear writing systemhttps://hugocisneros.com/notes/unker_non_linear_writing_system/Fri, 19 Aug 2022 14:44:00 +0200https://hugocisneros.com/notes/unker_non_linear_writing_system/tags Language resources https://s.ai/nlws/ A fascinating writing system based on glyphs connected to each other to create meaning. The system is quite advanced and reading its grammar is like discovering some new alien language. It was created by Alex Fink and Sai in 2010. This is the kind of complexity that would be incredible to discover in open-ended evolving language systems. A system starting from elementary components and no particular assumption about what language should be, could come up with such exotic models (probably even more exotic in the case of a truly open-ended system).Optimal transporthttps://hugocisneros.com/notes/optimal_transport/Thu, 18 Aug 2022 15:38:00 +0200https://hugocisneros.com/notes/optimal_transport/tags Applied maths Ramified optimal transport Introduction to ramified optimal transportationGraham scanhttps://hugocisneros.com/notes/graham_scan/Thu, 04 Aug 2022 14:10:00 +0200https://hugocisneros.com/notes/graham_scan/tags Algorithm Graham scan is an algorithm to find the convex hull of a set of points in 2D. It runs with a time complexity of $\mathcal{O}(n\log n)$. The algorithm is relatively simple. It starts by selecting the point with lowest $y$-coordinate. At each step of the algorithm, remaining points are sorted by increasing order of the angle they and the last added point make. Then, if this new point isKullback-leibler divergencehttps://hugocisneros.com/notes/kullback_leibler_divergence/Thu, 04 Aug 2022 14:09:00 +0200https://hugocisneros.com/notes/kullback_leibler_divergence/tags Applied maths Definition The KL divergence is not symmetric. For $P, Q$ defined on the same probability space $\mathcal{X}$, KL of $Q$ from $P$ is \[ KL(P, Q) = \sum_{x \in \mathcal{X}} P(x) \log\left( \frac{P(x)}{Q(x)} \right) \] It has two main interpretations: It is the information gain from using the right probability distribution $P$ instead of $Q$ or the amount of information lost by approximating $P$ with $Q$. The average difference in code length for a sequence following $P$ and using a code optimized for $Q$ to encode it.Notes on: Fast and stable MAP-Elites in noisy domains using deep grids by Flageat, M., & Cully, A. (2020)https://hugocisneros.com/notes/flageatfaststablemapelites2020/Thu, 04 Aug 2022 14:09:00 +0200https://hugocisneros.com/notes/flageatfaststablemapelites2020/source (Flageat, Cully 2020) tags ALife 2020, MAP-Elites, Quality diversity Summary MAP-Elites can be problematic in face of uncertainty because: individuals can be unexpectedly lucky the behavior space can be hard to estimate and result in misplacing individuals. Some mitigation techniques have been explored, e.g in (Justesen et al. 2019) and this paper is about introducing another way of dealing with noisy domains without using sampling. Here the main idea is to replace the MAP-elites grid by a “deep grid” with another dimension.Notes on: Resilient Life: An Exploration of Perturbed Autopoietic Patterns in Conway's Game of Life by Cika, A., Cohen, E., Kruszewski, G., Seet, L., Steinmann, P., & Yin, W. (2020)https://hugocisneros.com/notes/cikaresilientlifeexploration2020/Thu, 04 Aug 2022 14:08:00 +0200https://hugocisneros.com/notes/cikaresilientlifeexploration2020/tags ALife 2020, Cellular automata, Autopoiesis source (Cika et al. 2020) Summary This paper is about the possible resistance of GoL patterns to perturbations and the structures that could enable this to happen. They also want to know if resilience is a universal property of computational systems. They test two types of resilience: Additive (add one or two live cells to the pattern) Negative (“kill” one or two live cells from the pattern) They use 3 metrics for resilience:Notes on: Safe Reinforcement Learning through Meta-learned Instincts by Grbic, D., & Risi, S. (2020)https://hugocisneros.com/notes/grbicsafereinforcementlearning2020/Thu, 04 Aug 2022 14:07:00 +0200https://hugocisneros.com/notes/grbicsafereinforcementlearning2020/source (Grbic, Risi 2020) tags Meta-learning, Reinforcement learning, ALife 2020 Summary In RL an important goal is to find agents that can quickly adapt to changing environments while avoiding unsafe states. However, in deep RL, there is often noise added to explore the action space: this can lead to unsafe part of the state-action space. Figure 1: Slide from the Alife talk The meta-learning setting of MAML is adapted to RL, with a policy network learning the policy in a standard way and a “instinctual network” which is fixed for a group of tasks and modulates the regular policy with its own action vector.Talk: Alife 2020 keynote Michael Levin - Robot Cancerhttps://hugocisneros.com/notes/talk_alife_2020_keynote_michael_levin_robot_cancer/Thu, 04 Aug 2022 14:06:00 +0200https://hugocisneros.com/notes/talk_alife_2020_keynote_michael_levin_robot_cancer/tags Emergence, Biological life, ALife 2020 How do organisms store information and are able to pass it down through very profound structural changes (from a caterpillar to a butterfly, when cutting a flatworm in multiple pieces, etc.)? Embryogenesis is a reliable self-assembly. It relies on stem cell differentiation but that’s not enough: some tumor (Teratoma) are differentiated but don’t have the right 3D spatial organisation. Where is the large-scale pattern specified?Hilbert curve indexinghttps://hugocisneros.com/notes/hilbert_curve_indexing/Thu, 04 Aug 2022 14:05:00 +0200https://hugocisneros.com/notes/hilbert_curve_indexing/tags Coding Hilbert curves can be used for an interesting trick involving 2D arrays indexing. Because of the way the Hilbert curve traverses the 2D space, indexing a 2D array this way can be a more cache-friendly solution when frequently accessing neighbors of an array element. Figure 1: Hilbert curve with different number of iterations C implementation from Wikipedia to convert (x,y) coordinates to linear ones and vice versa:Notes on: Scaling down Deep Learning by Greydanus, S. (2020)https://hugocisneros.com/notes/greydanusscalingdeeplearning2020/Thu, 04 Aug 2022 14:03:00 +0200https://hugocisneros.com/notes/greydanusscalingdeeplearning2020/tags Neural networks source (Greydanus 2020) Summary This paper introduces a minimalist 1D version of the MNIST dataset for studying some basic properties of neural networks. The authors simplify the MNIST dataset by assigning a 1D glyph to each digit. These glyphs are padded, translated, sheared and blurred to build a dataset of multiple different objects. The figure from the paper shown below illustrates this dataset’s construction: Figure 1: 1D simple MNISTSemantic similarityhttps://hugocisneros.com/notes/semantic_similarity/Thu, 04 Aug 2022 13:42:00 +0200https://hugocisneros.com/notes/semantic_similarity/tags NLP, Evaluating NLP N-gram matching For two sequences $x$ and $\hat{x}$, we denote the sequence of $n$-grams with $S_x^n$ and $S^n_{\hat{x}}$. The number of matched $n$-grams between the two sentences is: \[ \sum_{w \in S_{\hat{x}}^n} \mathbb{I}[w \in S_{x}^n ] \] with $\mathbb{I}$ the indicator function. From this we can construct the exact match precision (Exact-$P_n$) and recall (Exact-$R_n$): \[ \text{Exact-}$P_n$ = \frac{\sum_{w \in S_{\hat{x}}^n} \mathbb{I}[w \in S_{x}^n ]}{| S_{\hat{x}}^n|} \] and \[ \text{Exact-}$R_n$ = \frac{\sum_{w \in S_{x}^n} \mathbb{I}[w \in S_{\hat{x}}^n ]}{| S_{x}^n|} \]Machine translationhttps://hugocisneros.com/notes/machine_translation/Thu, 04 Aug 2022 13:31:00 +0200https://hugocisneros.com/notes/machine_translation/ tags NLPBenford's lawhttps://hugocisneros.com/notes/benford_s_law/Thu, 04 Aug 2022 13:20:00 +0200https://hugocisneros.com/notes/benford_s_law/tags Statistics A set of numbers satisfies Benford’s law if the leading digits of these numbers occur with a probability logarithmically decreasing with the digit. More precisely, for $d \in \{1, \ldots, 9\}$, \[ P(d) = \log_{10} \left(1 + \frac{1}{d} \right) \] Many sequence that span multiple orders of magnitude satisfy Benford’s law, including the Fibonacci sequence (Washington 1981). The law has been proposed for use in fraud detection, because artificial uniformly distributed fake numbers would not follow the law.t-SNEhttps://hugocisneros.com/notes/t_sne/Tue, 02 Aug 2022 14:03:00 +0200https://hugocisneros.com/notes/t_sne/tags Machine learning paper (Van der Maaten, Hinton 2008) Example: Embedding the vertices of a high dimensional cube in 2D We first create a cube with dimension 12: import imageio import numpy as np import matplotlib.pyplot as plt from sklearn.manifold import TSNE # This creates a numpy array with the vertices of a N-dimensional cube. # From https://stackoverflow.com/a/52229558 cube = lambda N: 2*((np.arange(2**N)[:,None] & (1 << np.arange(N))) > 0) - 1 # Create a 12-hypercube's vertices cube_arr = cube(12) steps = [] # We run t-SNE with 250 to 5000 iterations.Adversarial exampleshttps://hugocisneros.com/notes/adversarial_examples/Mon, 01 Aug 2022 17:29:00 +0200https://hugocisneros.com/notes/adversarial_examples/tags Machine learning, Neural networks Adversarial examples in Reinforcement learning Adversarial examples in Computer vision Adversarial examples in NLP A Python library for creating and using text attacks: TextAttack. Figure 1: This diagram illustrates the standard flow of an adversarial attack on text data. The three components of a text adversarial example: Goal function: This is a function that takes an original sentence, an attacked sentence, computes a score and the result of the attack (successful or not).Snake in the tunnelhttps://hugocisneros.com/notes/snake_in_the_tunnel/Mon, 01 Aug 2022 16:21:00 +0200https://hugocisneros.com/notes/snake_in_the_tunnel/ tags Economics, Europe resources WikipediaEuropehttps://hugocisneros.com/notes/europe/Mon, 01 Aug 2022 15:23:00 +0200https://hugocisneros.com/notes/europe/GPThttps://hugocisneros.com/notes/gpt/Wed, 27 Jul 2022 12:19:00 +0200https://hugocisneros.com/notes/gpt/ tags Transformers, NLP paper (Radford et al. 2018) Succesors The GPT architecture was improved upon and extended into GPT-2 and GPT-3. The original “GPT-1” was quickly abandoned in favor of its successor, but GPT is still used to refer to this family of models. Parameter count 117M Bibliography Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever. 2018. "Improving Language Understanding by Generative Pre-training". OpenAI.Gatohttps://hugocisneros.com/notes/gato/Wed, 27 Jul 2022 12:12:00 +0200https://hugocisneros.com/notes/gato/ tags Transformers, Reinforcement learning paper (Reed et al. 2022) Architecture A standard decoder-only transformer is preceded by an embedding layer that embeds text and images with positional encoding and spatial information if available. Parameter count 1.2B Bibliography Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, et al.. May 12, 2022. "A Generalist Agent". https://arxiv.org/abs/2205.06175v2.XLNethttps://hugocisneros.com/notes/xlnet/Wed, 27 Jul 2022 12:06:00 +0200https://hugocisneros.com/notes/xlnet/ tags Transformers, Transformer-XL, NLP paper (Yang et al. 2020) Architecture The model adapts Transformer-XL to be a permutation based language model. Parameter count Base = 117M Large = 360M Bibliography Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. January 2, 2020. "Xlnet: Generalized Autoregressive Pretraining for Language Understanding". arXiv. DOI.XLM-RoBERTahttps://hugocisneros.com/notes/xlm_roberta/Wed, 27 Jul 2022 12:04:00 +0200https://hugocisneros.com/notes/xlm_roberta/ tags Transformers, RoBERTa, NLP paper (Conneau et al. 2020) Architecture The model is an extension of RoBERTa that introduces small parameter tuning insights in the context of multilingual applications. Parameter count Base = 270M Large = 550M Bibliography Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov. April 7, 2020. "Unsupervised Cross-lingual Representation Learning at Scale". arXiv. DOI.Wu Dao 2.0https://hugocisneros.com/notes/wu_dao_2_0/Wed, 27 Jul 2022 11:52:00 +0200https://hugocisneros.com/notes/wu_dao_2_0/tags Transformers, NLP website Wikipedia page for Wu Dao Architecture It is similar to GPT, being a decoder architecture but it applies a different pre-training task. Parameter count 1.75TTuring-NLGhttps://hugocisneros.com/notes/turing_nlg/Wed, 27 Jul 2022 11:48:00 +0200https://hugocisneros.com/notes/turing_nlg/tags Transformers, GPT, NLP website Microsoft Project Turing Architecture The architecture is similar to GPT-2 and GPT-3 with some parameter optimization and software/hardware platform to improve training. Parameter count 17B originally, now up to 530B.Vision transformerhttps://hugocisneros.com/notes/vision_transformer/Wed, 27 Jul 2022 11:46:00 +0200https://hugocisneros.com/notes/vision_transformer/ tags Transformers, Computer vision, BERT paper (Dosovitskiy et al. 2021) Architecture It is an extension of the BERT architecture that can be trained on patches of images. Parameter count 86M to 632M Bibliography Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, et al.. June 3, 2021. "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale". arXiv. DOI.Transformer-XLhttps://hugocisneros.com/notes/transformer_xl/Wed, 27 Jul 2022 11:42:00 +0200https://hugocisneros.com/notes/transformer_xl/ tags Transformers, NLP paper (Dai et al. 2019) Architecture This model uses relative positional embedding to enable using attention over longer contexts than the vanilla Transformer. Parameter count 151M Bibliography Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. June 2, 2019. "Transformer-xl: Attentive Language Models Beyond a Fixed-length Context". arXiv. DOI.Trajectory transformerhttps://hugocisneros.com/notes/trajectory_transformer/Wed, 27 Jul 2022 11:41:00 +0200https://hugocisneros.com/notes/trajectory_transformer/ tags Transformers, Reinforcement learning, GPT paper (Janner et al. 2021) Architecture It is a similar model to Decision transformer, with some added techniques to encode a trajectory. Bibliography Michael Janner, Qiyang Li, Sergey Levine. November 28, 2021. "Offline Reinforcement Learning as One Big Sequence Modeling Problem". arXiv. DOI.T5https://hugocisneros.com/notes/t5/Wed, 27 Jul 2022 11:28:00 +0200https://hugocisneros.com/notes/t5/ tags Transformers, NLP paper (Raffel et al. 2020) Architecture It is the same as the original transformer with some relative positional embedding added (similar to Transformer-XL). Parameter count 11B Bibliography Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. July 28, 2020. "Exploring the Limits of Transfer Learning with a Unified Text-to-text Transformer". arXiv. DOI.Switch transformerhttps://hugocisneros.com/notes/switch_transformer/Wed, 27 Jul 2022 11:06:00 +0200https://hugocisneros.com/notes/switch_transformer/ tags Transformers, T5, NLP paper (Fedus et al. 2022) Architecture This model increases the parameter count of T5-like architecture while allowing efficient routing through different experts in a mixture of experts. Parameter count 1T Bibliography William Fedus, Barret Zoph, Noam Shazeer. June 16, 2022. "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity". arXiv. DOI.Swin Transformerhttps://hugocisneros.com/notes/swin_transformer/Wed, 27 Jul 2022 11:04:00 +0200https://hugocisneros.com/notes/swin_transformer/ tags Transformers, ViT, Computer vision paper (Liu et al. 2021) Architecture This model extends ViT by replace the multi-head self-attention with a “shifted windows” module allowing ViT to work with higher resolution images. Parameter count 29M - 197M Bibliography Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. August 17, 2021. "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows". arXiv. DOI.SeeKerhttps://hugocisneros.com/notes/seeker/Wed, 27 Jul 2022 11:01:00 +0200https://hugocisneros.com/notes/seeker/tags Transformers, GPT paper (Shuster et al. 2022) Architecture This is an extension that can be applied to any Transformer model by introducing “search”, “knowledge”, and “response” modules during pre-training of the model. It has the same applications as the base model it extends. Parameter count Depends on the base model being extended. Bibliography Kurt Shuster, Mojtaba Komeili, Leonard Adolphs, Stephen Roller, Arthur Szlam, Jason Weston. March 29, 2022. "RoBERTahttps://hugocisneros.com/notes/roberta/Wed, 27 Jul 2022 10:46:00 +0200https://hugocisneros.com/notes/roberta/ tags Transformers, BERT, NLP paper (Liu et al. 2019) Architecture This is an extension of BERT with more data and a better optimized training procedure. Parameter count 356M Bibliography Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. July 26, 2019. "Roberta: A Robustly Optimized BERT Pretraining Approach". arXiv. http://arxiv.org/abs/1907.11692.Pegasushttps://hugocisneros.com/notes/pegasus/Wed, 27 Jul 2022 10:45:00 +0200https://hugocisneros.com/notes/pegasus/ tags Transformers, NLP paper (Zhang et al. 2020) Architecture This is a standard encoder/decoder architecture with a special pre-training task suited for summarization of text. Parameter count Base = 223M Large = 568M Bibliography Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu. July 10, 2020. "PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization". arXiv. http://arxiv.org/abs/1912.08777.PaLMhttps://hugocisneros.com/notes/palm/Wed, 27 Jul 2022 10:43:00 +0200https://hugocisneros.com/notes/palm/tags Transformers, NLP paper (Chowdhery et al. 2022) Architecture This is a standard decoder-only architecture with some specific extensions: SwiGLU activation functions Parallel layers Multi-query attention RoPE embeddings Shared input-output embeddings No biaises A 256k SentencePiece vocabulary generated from the training data Parameter count 540B Bibliography Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, et al.. April 19, 2022. "Palm: Scaling Language Modeling with Pathways"OPT: Open Pre-trained Transformerhttps://hugocisneros.com/notes/opt/Wed, 27 Jul 2022 10:40:00 +0200https://hugocisneros.com/notes/opt/ tags Transformers, GPT, NLP paper (Zhang et al. 2022) Architecture It is the same architecture as GPT-3 but with some training improvements from Megatron. Parameter count 175B Bibliography Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, et al.. June 21, 2022. "OPT: Open Pre-trained Transformer Language Models". arXiv. http://arxiv.org/abs/2205.01068.Minervahttps://hugocisneros.com/notes/minerva/Tue, 26 Jul 2022 15:21:00 +0200https://hugocisneros.com/notes/minerva/ tags Transformers, Mathematics, PaLM paper (Lewkowycz et al. 2022) Architecture This model is PaLM fine-tuned on mathematical datasets. Parameter count 540B Bibliography Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, et al.. June 30, 2022. "Solving Quantitative Reasoning Problems with Language Models". arXiv. http://arxiv.org/abs/2206.14858.Megatronhttps://hugocisneros.com/notes/megatron/Tue, 26 Jul 2022 15:18:00 +0200https://hugocisneros.com/notes/megatron/ tags Transformers, GPT, BERT, T5 paper (Shoeybi et al. 2020) Architecture The principle of Megatron is to extend existing architectures by using model parallelism. It has a number of parameters that depends on the base model used. Bibliography Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro. March 13, 2020. "Megatron-lm: Training Multi-billion Parameter Language Models Using Model Parallelism". arXiv. DOI.mBARThttps://hugocisneros.com/notes/mbart/Tue, 26 Jul 2022 15:15:00 +0200https://hugocisneros.com/notes/mbart/ tags Transformers, NLP, BART paper (Liu et al. 2020) Architecture It’s an encoder-decoder architecture based on BART Bibliography Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. January 23, 2020. "Multilingual Denoising Pre-training for Neural Machine Translation". arXiv. DOI.LAMDAhttps://hugocisneros.com/notes/lamda/Tue, 26 Jul 2022 11:53:00 +0200https://hugocisneros.com/notes/lamda/ tags Transformers, NLP paper (Thoppilan et al. 2022) Parameter count 137B Bibliography Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, et al.. February 10, 2022. "Lamda: Language Models for Dialog Applications". arXiv. http://arxiv.org/abs/2201.08239.Jurassic-1https://hugocisneros.com/notes/jurassic_1/Tue, 26 Jul 2022 11:46:00 +0200https://hugocisneros.com/notes/jurassic_1/ tags Transformers, GPT, NLP blog post AI21Labs blog Architecture This model is similar to GPT-3 with an improved tokenizer that increases the learning efficiency. It also has more parameters. Parameter count 178B BibliographyImagenhttps://hugocisneros.com/notes/imagen/Tue, 26 Jul 2022 11:41:00 +0200https://hugocisneros.com/notes/imagen/tags Transformers, Diffusion models, Computer vision, NLP, T5, CLIP paper (Saharia et al. 2022) Architecture This is based on the U-net diffusion architecture with a few extensions. T5 or CLIP or BERT is used as a frozen text encoder. Parameter count 2B Bibliography Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, et al.. May 23, 2022. "Photorealistic Text-to-image Diffusion Models with Deep Language Understanding"GPTInstructhttps://hugocisneros.com/notes/gptinstruct/Tue, 26 Jul 2022 11:10:00 +0200https://hugocisneros.com/notes/gptinstruct/ tags Transformers, GPT, NLP paper (Ouyang et al. 2022) Architecture This model starts off from a pretrained GPT-3. Reward modeling is added with Reinforcement learning. Parameter count 175B Bibliography Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, et al.. March 4, 2022. "Training Language Models to Follow Instructions with Human Feedback". arXiv. DOI.GPT-Neohttps://hugocisneros.com/notes/gpt_neo/Tue, 26 Jul 2022 11:06:00 +0200https://hugocisneros.com/notes/gpt_neo/ tags Transformers, GPT, NLP software <&gpt-neo> Architecture This model is very similar to GPT-2, with the addition of local attention every other layer and a window size of 256 tokens. Parameter count 1.5B, 2.7B (XL) BibliographyGlobal context ViThttps://hugocisneros.com/notes/global_context_vit/Tue, 26 Jul 2022 11:04:00 +0200https://hugocisneros.com/notes/global_context_vit/ tags Transformers, Computer vision, ViT paper (Hatamizadeh et al. 2022) Architecture This is a hierarchical version of ViT with both local and global attention. Parameter count 90M Bibliography Ali Hatamizadeh, Hongxu Yin, Jan Kautz, Pavlo Molchanov. June 20, 2022. "Global Context Vision Transformers". arXiv. DOI.GLIDEhttps://hugocisneros.com/notes/glide/Tue, 26 Jul 2022 11:02:00 +0200https://hugocisneros.com/notes/glide/ tags Diffusion models, NLP, Computer vision paper (Nichol et al. 2022) Architecture This model uses joint textual and visual embedding diffusion model followed by some upsampling. Parameter count 3.5B Bibliography Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen. March 8, 2022. "GLIDE: Towards Photorealistic Image Generation and Editing with Text-guided Diffusion Models". arXiv. DOI.GPT-3https://hugocisneros.com/notes/gpt_3/Tue, 26 Jul 2022 10:06:00 +0200https://hugocisneros.com/notes/gpt_3/ tags Transformers, NLP, GPT paper (Brown et al. 2020) Architecture Like GPT-2, with the addition of locally banded sparse attention. Parameter count 175B Bibliography Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al.. June 4, 2020. "Language Models Are Few-shot Learners". Arxiv:2005.14165 [cs]. http://arxiv.org/abs/2005.14165.GPT-2https://hugocisneros.com/notes/gpt_2/Tue, 26 Jul 2022 10:04:00 +0200https://hugocisneros.com/notes/gpt_2/ tags Transformers, GPT paper (Radford et al. 2019) Architecture Some minor changes from GPT, like a larger context and some order change of normalization. Parameter count 1.5B Bibliography Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. 2019. "Language Models Are Unsupervised Multitask Learners". Openai Blog 1 (8):9.GLaMhttps://hugocisneros.com/notes/glam/Tue, 26 Jul 2022 10:01:00 +0200https://hugocisneros.com/notes/glam/tags Transformers, NLP paper (Du et al. 2021) Architecture The model is a mixture of 64 expert decoder-only transformer architectures. Two experts are activated per token, making the model relatively efficient for its number of parameters Parameter count 1.2T total, 96B active per token. Bibliography Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, et al.. December 13, 2021. "Glam: Efficient Scaling of Language Models with Mixture-of-experts"Flamingohttps://hugocisneros.com/notes/flamingo/Tue, 26 Jul 2022 09:56:00 +0200https://hugocisneros.com/notes/flamingo/ tags Transformers, Computer vision, NLP, Chinchilla paper (Alayrac et al. 2022) Architecture Uses a frozen language model (e.g. Chinchilla) that is conditioned on a visual representation given from a normalizer-free ResNet. Parameter count 80B Bibliography Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, et al.. April 29, 2022. "Flamingo: A Visual Language Model for Few-shot Learning". arXiv. http://arxiv.org/abs/2204.14198.ERNIEhttps://hugocisneros.com/notes/ernie/Tue, 26 Jul 2022 09:51:00 +0200https://hugocisneros.com/notes/ernie/ tags Transformers, BERT, NLP paper (Zhang et al. 2019) Architecture This transformer uses two stacked BERT for encoding: one for the text, one for the entities in a knowledge graph. Parameter count 114M Bibliography Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, Qun Liu. June 4, 2019. "ERNIE: Enhanced Language Representation with Informative Entities". arXiv. DOI.ELECTRAhttps://hugocisneros.com/notes/electra/Tue, 26 Jul 2022 09:08:00 +0200https://hugocisneros.com/notes/electra/ tags Transformers, NLP paper (Clark et al. 2020) Paramter count Base = 110M Large = 330M Bibliography Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. 2020. "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators". In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=r1xMH1BtvB.DQ-BARThttps://hugocisneros.com/notes/dq_bart/Tue, 26 Jul 2022 09:05:00 +0200https://hugocisneros.com/notes/dq_bart/ tags Transformers, BART, NLP paper (Li et al. 2022) Architecture It is a distilled and quantized version of BART. It improves performance as well as the model size. Bibliography Zheng Li, Zijian Wang, Ming Tan, Ramesh Nallapati, Parminder Bhatia, Andrew Arnold, Bing Xiang, Dan Roth. March 21, 2022. "DQ-BART: Efficient Sequence-to-sequence Model via Joint Distillation and Quantization". arXiv. DOI.DistillBERThttps://hugocisneros.com/notes/distillbert/Tue, 26 Jul 2022 08:43:00 +0200https://hugocisneros.com/notes/distillbert/ tags Transformers, BERT, NLP paper (Sanh et al. 2020) Architecture It is a distilled version of BERT that is much more efficient. Parameter count 66M Bibliography Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf. February 29, 2020. "Distilbert, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter". arXiv. DOI.DialoGPThttps://hugocisneros.com/notes/dialogpt/Fri, 22 Jul 2022 13:07:00 +0200https://hugocisneros.com/notes/dialogpt/ tags GPT, Transformers, NLP paper (Zhang et al. 2020) Architecture It is exactly like a GPT-2 architecture but trained on dialog data. Parameter count 1.5B Bibliography Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. May 2, 2020. "Dialogpt: Large-scale Generative Pre-training for Conversational Response Generation". arXiv. DOI.Decision transformerhttps://hugocisneros.com/notes/decision_transformer/Fri, 22 Jul 2022 13:03:00 +0200https://hugocisneros.com/notes/decision_transformer/ tags Transformers, GPT, Reinforcement learning paper (Chen et al. 2021) Architecture This is a decoder model that uses a GPT-like model to encode and predict trajectories for Reinforcement learning tasks. It has essentially the same characteristics as GPT. Bibliography Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. June 24, 2021. "Decision Transformer: Reinforcement Learning via Sequence Modeling". arXiv. DOI.ALBERThttps://hugocisneros.com/notes/albert/Fri, 22 Jul 2022 13:02:00 +0200https://hugocisneros.com/notes/albert/tags Transformers, BERT, NLP paper (Lan et al. 2020) Architecture It is an encoder-only architecture. It extends BERT by using parameter-sharing and is more efficient than BERT with the same number of parameters. Parameter count Base = 12M Large = 18M XLarge = 60M Bibliography Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. February 8, 2020. "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"BERThttps://hugocisneros.com/notes/bert/Fri, 22 Jul 2022 13:02:00 +0200https://hugocisneros.com/notes/bert/ tags Transformers, NLP paper (Devlin et al. 2019) Parameter count Base = 110M Large = 340M Bibliography Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. May 24, 2019. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv. DOI.BLOOMhttps://hugocisneros.com/notes/bloom/Fri, 22 Jul 2022 13:02:00 +0200https://hugocisneros.com/notes/bloom/ tags Transformers, GPT, NLP blog post BLOOM announcement blog post Architecture It is similar to the architecture of GPT-3, using full attention instead of sparse attention. Parameter count 176B BibliographyCTRLhttps://hugocisneros.com/notes/ctrl/Fri, 22 Jul 2022 13:02:00 +0200https://hugocisneros.com/notes/ctrl/tags Transformers, NLP paper (Keskar et al. 2019) Architecture This is a model that can generate text conditioned on control codes that specify the domain, style, topics, dates, entities, relationships between entities, plot points, and task-related behavior of the text. Parameter count 1.63B Bibliography Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher. September 20, 2019. "CTRL: A Conditional Transformer Language Model for Controllable Generation". arXiv. DOI.Big birdhttps://hugocisneros.com/notes/big_bird/Fri, 22 Jul 2022 13:01:00 +0200https://hugocisneros.com/notes/big_bird/tags Transformers, NLP paper (Zaheer et al. 2021) Architecture Big bird can be used as both an encoder-only and an encoder/decoder architecture. It extends the likes of BERT by implementing a sparse attention mechanism, making the attention computational complexity less than quadratic. Bibliography Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, et al.. January 8, 2021. "Big Bird: Transformers for Longer Sequences". arXiv. DOI.DALL-E-2https://hugocisneros.com/notes/dall_e_2/Fri, 22 Jul 2022 13:00:00 +0200https://hugocisneros.com/notes/dall_e_2/ tags Transformers, Diffusion models, CLIP paper (Ramesh et al. 2022) Architecture This is the successor of DALL-E, it is an encoder/decoder model that uses a combination of CLIP and Diffusion models to generate images from text. The diffusion decoder is similar to GLIDE. Parameter count 3.5B Bibliography Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen. April 12, 2022. "Hierarchical Text-conditional Image Generation with CLIP Latents". arXiv. DOI.DALL-Ehttps://hugocisneros.com/notes/dall_e/Fri, 22 Jul 2022 12:53:00 +0200https://hugocisneros.com/notes/dall_e/ tags Transformers, GPT paper (Ramesh et al. 2021) Architecture It is a decoder architecture with a Variational autoencoders and a variant of GPT-3 to convert text to images. Parameter count 12B Bibliography Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever. February 26, 2021. "Zero-shot Text-to-image Generation". arXiv. DOI.CLIPhttps://hugocisneros.com/notes/clip/Fri, 22 Jul 2022 12:29:00 +0200https://hugocisneros.com/notes/clip/ tags Transformers, NLP, Computer vision paper (Radford et al. 2021) Architecture It is an encoder-only model which combines ViT and ResNet to encode images and a transformer for the text encoding. Bibliography Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, et al.. February 26, 2021. "Learning Transferable Visual Models from Natural Language Supervision". arXiv. DOI.Residual neural networkshttps://hugocisneros.com/notes/residual_networks/Fri, 22 Jul 2022 12:28:00 +0200https://hugocisneros.com/notes/residual_networks/tags Neural networks, Convolutional neural networks, Computer vision resources (He et al. 2016) Residual neural networks are neural networks with skip-connections (or shortcuts, residual connections) that will bypass some of the networks operations in depth. Highway networks (Srivastava et al. 2015) DenseNets (<cite itemprop=“citation” itemscope=““Huang, Liu ,n.d.) Bibliography Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. June 2016. "Deep Residual Learning for Image Recognition". In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–78.Chinchillahttps://hugocisneros.com/notes/chinchilla/Fri, 22 Jul 2022 12:27:00 +0200https://hugocisneros.com/notes/chinchilla/ tags Transformers, GPT, NLP paper (Hoffmann et al. 2022) Architecture This model is very similar to Gopher, with some improvements to make the model smaller and more efficient. Parameter count 70B Bibliography Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, et al.. March 29, 2022. "Training Compute-optimal Large Language Models". arXiv. DOI.Positional encodinghttps://hugocisneros.com/notes/positional_encoding/Fri, 22 Jul 2022 11:57:00 +0200https://hugocisneros.com/notes/positional_encoding/ tags Transformers, AttentionBARThttps://hugocisneros.com/notes/bart/Fri, 22 Jul 2022 10:11:00 +0200https://hugocisneros.com/notes/bart/ tags Transformers paper (Lewis et al. 2019) Architecture It is an encoder/decoder architecture. The encoder is based on BERT and the decoder is based on GPT. It generalizes the two models into a single one. Bibliography Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer. October 29, 2019. "BART: Denoising Sequence-to-sequence Pre-training for Natural Language Generation, Translation, and Comprehension". arXiv. DOI.Byte-pair encodinghttps://hugocisneros.com/notes/byte_pair_encoding/Thu, 14 Jul 2022 09:34:00 +0200https://hugocisneros.com/notes/byte_pair_encoding/tags NLP The process of byte-pair encoding can be summarized as follow: Each character is a token Find pairs that occur most often Create a new token that encoded those common pairs Repeat the process until target vocabulary size is reached The output of this process is both a vocabulary and a set of merging rules for tokens to be used to process more data. This technique has several advantages:Land-value taxhttps://hugocisneros.com/notes/land_value_tax/Wed, 15 Jun 2022 13:25:00 +0200https://hugocisneros.com/notes/land_value_tax/ tags Economics, Taxation resources WikipediaTaxationhttps://hugocisneros.com/notes/taxation/Wed, 15 Jun 2022 13:20:00 +0200https://hugocisneros.com/notes/taxation/ tags EconomicsMotion planninghttps://hugocisneros.com/notes/motion_planning/Wed, 15 Jun 2022 13:02:00 +0200https://hugocisneros.com/notes/motion_planning/ tags Artificial IntelligencePytorchhttps://hugocisneros.com/notes/pytorch/Tue, 07 Jun 2022 11:53:00 +0200https://hugocisneros.com/notes/pytorch/tags Python, Machine learning Pytorch is an autodiff library used to do machine learning in Python. Pytorch tricks I don’t know who originally made this list. I also don’t know how many of those have been addressed in recent versions. If some of these tricks are not valid anymore let me know: DataLoader has bad default settings, tune num_workers > 0 and default to pin_memory = True. Use torch.backends.cudnn.benchmark = True to autotune cudnn kernel choice Max out the batch size for each GPU to ammortize compute.Article: Uncertain timeshttps://hugocisneros.com/notes/article_uncertain_times/Tue, 07 Jun 2022 09:11:00 +0200https://hugocisneros.com/notes/article_uncertain_times/authors Melanie Mitchell, Jessica Flack source Aeon tags Complex Systems This article is about adopting a complex systems-based view of societal phenomena. All human societies are a collective of individuals with coupled behaviors. As a result, large-scale society-wide information and local behaviors are coupled. This can be appealing, but also leads to surprising behavior. In complex systems, noise feeding back onto itself can lead to transitions to orderly states. In practice, this means individual slightly random positive decisions can results in a society-scale negative behavior.Open-ended Evolutionhttps://hugocisneros.com/notes/open_ended_evolution/Mon, 06 Jun 2022 16:03:00 +0200https://hugocisneros.com/notes/open_ended_evolution/tags Evolution, Complexity, Artificial Intelligence resources Open-endedness: The last grand challenge you’ve never heard of Objectives and open-endedness Objective functions are commonly used in all areas of machine learning (even so-called unsupervised learning, or evolutionary strategies). However this seems to be fundamentally opposed to how natural evolution proceeds. Innovations appear without a priori objectives and it isn’t clear if setting such objectives for ourselves is slowing down progress by putting the focus on too narrow paths to a solution (Woolley, Stanley 2011; Stanley, Lehman 2015).Wokehttps://hugocisneros.com/notes/woke/Thu, 02 Jun 2022 10:30:00 +0200https://hugocisneros.com/notes/woke/tags Society Genealogy of the term From (Cammaerts 2022): Woke is intrinsically tied to black consciousness and anti-racist struggles. It was a black slang word which was first referenced in popular culture during a spoken word section at the end of a recording of the 1938 protest folksong ‘Scottsboro Boys’ by Lead Belly. The song refers to the gruesome case of nine black youth who were falsely accused of raping two white women and whose lives were destroyed by the deeply racist Alabama justice system (Cose 2020).Economicshttps://hugocisneros.com/notes/economics/Thu, 02 Jun 2022 10:22:00 +0200https://hugocisneros.com/notes/economics/ tags SocietySocietyhttps://hugocisneros.com/notes/society/Thu, 02 Jun 2022 10:22:00 +0200https://hugocisneros.com/notes/society/Asset economyhttps://hugocisneros.com/notes/asset_economy/Wed, 01 Jun 2022 22:23:00 +0200https://hugocisneros.com/notes/asset_economy/ tags Economics books (Adkins et al. 2020) Bibliography Lisa Adkins, Martijn Konings, Melinda Cooper. 2020. The Asset Economy: Property Ownership and the New Logic of Inequality. Medford: Polity Press.Extractivismhttps://hugocisneros.com/notes/extractivism/Wed, 01 Jun 2022 21:44:00 +0200https://hugocisneros.com/notes/extractivism/tags Climate, Economics From (Chagnon et al. 2022), a piece on extractivism as a concept and its relation to globalization: Extractivism as a concept forms a complex ensemble of self-reinforcing practices, mentalities, and power differentials underwriting and rationalizing socio-ecologically destructive modes of organizing life through subjugation, violence, depletion, and non-reciprocity. Bibliography Christopher W. Chagnon, Francesco Durante, Barry K. Gills, Sophia E. Hagolani-Albov, Saana Hokkanen, Sohvi M. J. Kangasluoma, Heidi Konttinen, et al.Globalizationhttps://hugocisneros.com/notes/globalization/Wed, 01 Jun 2022 20:53:00 +0200https://hugocisneros.com/notes/globalization/ tags Economics, Economic liberalism, Complex SystemsGreenhouse gas emissionshttps://hugocisneros.com/notes/greenhouse_gas_emissions/Wed, 01 Jun 2022 20:37:00 +0200https://hugocisneros.com/notes/greenhouse_gas_emissions/tags Climate Scopes Greenhouse gas emissions are often classified in 3 scopes, which roughly correspond to how “direct” the emission is, or how close is the actual source emitting the gas. Scope 1 are the direct emissions from combustion of fuel and direct use of fossil fuels. Scope 2 corresponds to indirect emission from energy usage. This includes any electricity that was produced from a fossil fuel source. Scope 3 is all the other indirect emissions anywhere in the value chain.Semantic primeshttps://hugocisneros.com/notes/semantic_primes/Wed, 01 Jun 2022 16:12:00 +0200https://hugocisneros.com/notes/semantic_primes/ tags Language resources WikipediaThe Scaling Hypothesishttps://hugocisneros.com/notes/the_scaling_hypothesis/Tue, 31 May 2022 15:22:00 +0200https://hugocisneros.com/notes/the_scaling_hypothesis/tags Artificial Intelligence, Neural networks From gwern’s website: The scaling hypothesis: neural nets absorb data & compute, generalizing and becoming more Bayesian as problems get harder, manifesting new abilities even at trivial-by-global-standards-scale.Projection on convex setshttps://hugocisneros.com/notes/projection_on_convex_sets/Wed, 25 May 2022 13:27:00 +0200https://hugocisneros.com/notes/projection_on_convex_sets/tags Optimization To solve the problem of finding $x \in \mathbb{R}^n$ such that $x\in C \cap D$ where $C$ and $D$ are closed convex sets, we project a candidate solution onto $D$ and $C$ successively until it converges to a point in the intersection. \[ x_{k+1} = \mathcal{P}_C (\mathcal{P}_D (x_k)) \]Notes on: Memorizing Transformers by Wu, Y., Rabe, M. N., Hutchins, D., & Szegedy, C. (2022)https://hugocisneros.com/notes/wumemorizingtransformers2022/Wed, 25 May 2022 13:26:00 +0200https://hugocisneros.com/notes/wumemorizingtransformers2022/source (Wu et al. 2022) tags Transformers, Memory in neural networks Summary This paper introduces a method to extend the classical Transformer neural network model with an addressable memory that can be queried and updated at inference time. This memory is addressed using an attention mechanism. It is a set of cached attention (key, value) vector pairs. At some arbitrary depth of the attention “stack” the memory mechanism is inserted.Memory in neural networkshttps://hugocisneros.com/notes/memory_in_neural_networks/Fri, 20 May 2022 23:38:00 +0200https://hugocisneros.com/notes/memory_in_neural_networks/ tags Neural networksNEAThttps://hugocisneros.com/notes/neat/Tue, 03 May 2022 11:27:00 +0200https://hugocisneros.com/notes/neat/ tags Neural architecture search, Evolutionary strategies, Neural networks papers (Stanley, Miikkulainen 2002) Bibliography Kenneth O. Stanley, Risto Miikkulainen. June 2002. "Evolving Neural Networks Through Augmenting Topologies". Evolutionary Computation 10 (2):99–127. DOI. See notesKerberoshttps://hugocisneros.com/notes/kerberos/Tue, 03 May 2022 10:51:00 +0200https://hugocisneros.com/notes/kerberos/tags Network authentication, Cryptography resources Main page, Computerphile video Kerberos is a centralized authentication protocol that uses symmetric encryption as its main way of ensuring online privacy on a network with a trusted central entity (e.g. a corporate network). A central server must have long term keys for every user on the network. It uses these keys to securely issue session keys with other devices on the network thanks to a Ticket-granting server (TGS).Network authenticationhttps://hugocisneros.com/notes/network_authentication/Tue, 03 May 2022 10:47:00 +0200https://hugocisneros.com/notes/network_authentication/Protocols Password authentication This is one of the simplest authentication method. The idea is just to send a pair (username, password) to the server. It is obviously vulnerable to man-in-the-middle attacks. Kerberos Secure Socket Layer (SSL) and Transport Layer Security (TLS)Turing-completenesshttps://hugocisneros.com/notes/turing_completeness/Tue, 03 May 2022 10:46:00 +0200https://hugocisneros.com/notes/turing_completeness/tags Computability theory, Computer science A system is Turing complete if it can be used to simulate any Turing Machine. Examples of Turing complete systems Some cellular automata Most Programming languages Lambda calculus Combinatory logic Others like Post-Turing Machines, formal grammar, formal language, etc. Some games (Minecraft, baba is you) and computational languages (markup languages like HTML+CSS) Other suprisingly turing complete systems that show that after a certain level of complexity it becomes possible to “stumble upon” turing completeness.Artificial Intelligencehttps://hugocisneros.com/notes/artificial_intelligence/Tue, 03 May 2022 10:45:00 +0200https://hugocisneros.com/notes/artificial_intelligence/Creating artificial intelligence through evolution If we take an evolutionary approach to the creation of AI, we may run into some problems. Of course, it is an extremely appealing idea, as it has been proven to work in our “Earth experiment”. Somehow, life and intelligent behavior has emerged from the synergy between the emergence of so-called living systems, Darwinian evolution and interactions with the environment. However a crucial issue is: Is there any shortcut in this approach?AI and climate changehttps://hugocisneros.com/notes/ai_and_climate_change/Mon, 02 May 2022 20:45:00 +0200https://hugocisneros.com/notes/ai_and_climate_change/tags Machine learning, Artificial Intelligence, Climate A general-purpose resource: (Dobbe, Whittaker 2019). A more technical resource with machine learning in mind: (Rolnick et al. 2019). Bibliography R. Dobbe, M. Whittaker. October 17, 2019. "AI and Climate Change: How They’re Connected, and What We Can Do About It". Medium (blog). October 17, 2019https://medium.com/@AINowInstitute/ai-and-climate-change-how-theyre-connected-and-what-we-can-do-about-it-6aa8d0f5b32c. David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, et al.AI capitalismhttps://hugocisneros.com/notes/ai_capitalism/Mon, 02 May 2022 20:02:00 +0200https://hugocisneros.com/notes/ai_capitalism/ tags Capitalism, Artificial Intelligence (Verdegem 2022) Bibliography Pieter Verdegem. April 9, 2022. "Dismantling AI Capitalism: The Commons as an Alternative to the Power Concentration of Big Tech". AI & SOCIETY. DOI.Capitalismhttps://hugocisneros.com/notes/capitalism/Mon, 02 May 2022 20:01:00 +0200https://hugocisneros.com/notes/capitalism/ tags EconomicsText classificationhttps://hugocisneros.com/notes/text_classification/Mon, 02 May 2022 10:43:00 +0200https://hugocisneros.com/notes/text_classification/ tags NLP resources (Minaee et al. 2020) A few examples are often cited as major applications of text classification: Spam detection Sentiment analysis Auto-tagging Categorization into topics Bibliography Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, Jianfeng Gao. April 5, 2020. "Deep Learning Based Text Classification: A Comprehensive Review". Arxiv:2004.03705 [cs, Stat]. http://arxiv.org/abs/2004.03705.Amorphous computinghttps://hugocisneros.com/notes/amorphous_computing/Mon, 02 May 2022 10:33:00 +0200https://hugocisneros.com/notes/amorphous_computing/tags Unconventional computing, Self-organization papers (Abelson et al. 2000) resources Wikipedia, CSAIL’s website Amorphous computing was coined by Abelson, Knight, Sussman et al. It refers to computational systems composed of a large number of identical parallel devices (processors) with limited computational capacity. The processors interact locally, without particular knowledge of their position in the medium. From (Abelson et al. 2000): A colony of cells cooperates to form a multicellular organism under the direction of a genetic program shared by the members of the colony.Ising modelhttps://hugocisneros.com/notes/ising_model/Mon, 02 May 2022 10:25:00 +0200https://hugocisneros.com/notes/ising_model/tags Complex Systems, Physics Simulation of an Ising model by a cellular automaton Several works have proposed and later refined cellular automaton-base algorithms of Ising models. (Vichniac 1984; Herrmann 1986; Ottavi, Parodi 1989) Bibliography Gérard Y. Vichniac. January 1, 1984. "Simulating Physics with Cellular Automata". Physica D: Nonlinear Phenomena 10 (1):96–116. DOI. H. J. Herrmann. October 1, 1986. "Fast Algorithm for the Simulation of Ising Models". Journal of Statistical Physics 45 (1):145–51.You and your researchhttps://hugocisneros.com/notes/you_and_your_research/Sun, 01 May 2022 20:37:00 +0200https://hugocisneros.com/notes/you_and_your_research/tags Writing source Web In the first place if you do some good work you will find yourself on all kinds of committees and unable to do any more work.The Meta-Problem of Consciousness with David Chalmershttps://hugocisneros.com/notes/the_meta_problem_of_consciousness_with_david_chalmers/Wed, 27 Apr 2022 10:30:00 +0200https://hugocisneros.com/notes/the_meta_problem_of_consciousness_with_david_chalmers/tags Consciousness link Youtube The hard problem of consciousness Why and how physical processes give rise to consciousness. This often refers to phenomenal consciousness or “what it’s like to be a subject”. Phenomenally conscious: For a system, if there something it’s like to be it. For a mental state, if there something it’s like to be in that state. This includes, visual and other sensory experiences, bodily sensations, mental imagery, emotionsNixhttps://hugocisneros.com/notes/nix/Tue, 26 Apr 2022 21:20:00 +0200https://hugocisneros.com/notes/nix/tags Coding, Programming languages https://github.com/cideM/dotfilesKnowledge argumenthttps://hugocisneros.com/notes/knowledge_argument/Tue, 26 Apr 2022 18:21:00 +0200https://hugocisneros.com/notes/knowledge_argument/tags Philosophy The thought experiment as formulated by Franck Jackson in (Jackson 1982): Mary is a brilliant scientist who is, for whatever reason, forced to investigate the world from a black and white room via a black and white television monitor. She specializes in the neurophysiology of vision and acquires, let us suppose, all the physical information there is to obtain about what goes on when we see ripe tomatoes, or the sky, and use terms like “red”, “blue”, and so on.Notes on: GARF: Gaussian Activated Radiance Fields for High Fidelity Reconstruction and Pose Estimation by Chng, S., Ramasinghe, S., Sherrah, J., & Lucey, S. (2022)https://hugocisneros.com/notes/chnggarfgaussianactivated2022/Tue, 26 Apr 2022 13:08:00 +0200https://hugocisneros.com/notes/chnggarfgaussianactivated2022/tags Implicit neural representations, NeRF source (Chng et al. 2022) web https://sfchng.github.io/garf/ Summary This paper introduces a positional embedding-free NeRF architecture which uses gaussian activation functions. These activation functions were introduced as part of Gaussian-MLPs in (Ramasinghe, Lucey 2022). This alternative activation function enables GARF to model first derivatives of the target signal better than Positional embeddings MLPs (PE-MLPs) (Mildenhall et al. 2020; Sitzmann et al. 2020). It also overcomes the initialization issues with SIRENs (Sitzmann et al.Implicit neural representationshttps://hugocisneros.com/notes/implicit_neural_representations/Tue, 26 Apr 2022 12:10:00 +0200https://hugocisneros.com/notes/implicit_neural_representations/tags Data representation, Neural networks resources Sitzmann’s Awesome Implicit Neural Representations github page Implicit neural representations is about parameterizing a continuous differentiable signal with a neural network. The signal is encoded within the neural network, providing a possibly more compact representation or allowing smooth parameter-based manipulation of that signal. This is a type of regression problem. Applications of these learned representations range from simple compression, to 3D scene reconstruction from 2D images, super-resolution, semantic information inference, etc.Schmidhuber on Consciousnesshttps://hugocisneros.com/notes/schmidhuber_on_consciousness/Tue, 26 Apr 2022 12:08:00 +0200https://hugocisneros.com/notes/schmidhuber_on_consciousness/tags Consciousness link Reddit comment From the reddit comment (links are added by me): Karl Popper famously said: “All life is problem solving.” No theory of consciousness is necessary to define the objectives of a general problem solver. From an AGI point of view, consciousness is at best a by-product of a general problem solving procedure. I must admit that I am not a big fan of Tononi’s theory. The following may represent a simpler and more general view of consciousness.Consciousnesshttps://hugocisneros.com/notes/consciousness/Tue, 26 Apr 2022 10:39:00 +0200https://hugocisneros.com/notes/consciousness/ tags PhilosophyTime to thresholdhttps://hugocisneros.com/notes/time_to_threshold/Sun, 24 Apr 2022 13:20:00 +0200https://hugocisneros.com/notes/time_to_threshold/tags Transfer learning, Reinforcement learning This is a simple metric first mentioned in (Taylor et al. 2007; Taylor, Stone 2007). In the paper by Taylor Stone and Liu, it is defined as: Time-to-Threshold: Measure the time needed to reach a performance threshold in the target task. In other words, this metric measures the time spent to reach a target performance for a given learning system. To write down this metric, we use aPattern-defeating quicksorthttps://hugocisneros.com/notes/pattern_defeating_quicksort/Sun, 24 Apr 2022 13:19:00 +0200https://hugocisneros.com/notes/pattern_defeating_quicksort/ tags Algorithm resources Youtube, (Peters 2021) This is a sorting algorithm based on the well known quicksort algorithm. It uses an number of optimizations on top of the base algorithm: Pivot selection Branchless partitioning Insertion sort base case Bounds check elimination Optimistic pre-sortedness Many equal values Breaking self-similarity $O(n^2)$ worst-case prevention Bibliography Orson R. L. Peters. June 9, 2021. "Pattern-defeating Quicksort". Arxiv:2106.05123 [cs]. http://arxiv.org/abs/2106.05123.Complexity metricshttps://hugocisneros.com/notes/complexity_metrics/Wed, 20 Apr 2022 13:32:00 +0200https://hugocisneros.com/notes/complexity_metrics/tags Complexity To study the complexity of various systems, researchers have come up with various metrics. They are based on several principles such as Information theory or Algorithmic Information theory. Many of these metrics are described in (Grassberger 1989). Shannon entropy and Kolmogorov Complexity The paper (Grunwald, Vitányi 2004) is a great description and analysis of two of the most important Complexity metrics: Shannon entropy Kolmogorov complexity Information-theoretic metrics Shannon entropy AIT based metrics For a Universal computer $U$ the algorithmic information of $S$ relative to $U$ is defined as the length of the shortest program that yields $S$ on $U$.Chaos computinghttps://hugocisneros.com/notes/chaos_computing/Tue, 19 Apr 2022 17:10:00 +0200https://hugocisneros.com/notes/chaos_computing/ tags Unconventional computing resources (Munakata et al. 2002) Bibliography T. Munakata, S. Sinha, W.L. Ditto. November 2002. "Chaos Computing: Implementation of Fundamental Logical Gates by Chaotic Elements". IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 49 (11):1629–33. DOI.PDFhttps://hugocisneros.com/notes/pdf/Tue, 19 Apr 2022 16:46:00 +0200https://hugocisneros.com/notes/pdf/ tags Writing Tips Make a PDF look scanned convert -density 150 input.pdf -colorspace gray -linear-stretch 3.5%x10% -blur 0x0.5 \ -attenuate 0.25 +noise Gaussian -rotate 0.5 temp.pdf gs -dSAFER -dBATCH -dNOPAUSE \ -dNOCACHE -sDEVICE=pdfwrite -sColorConversionStrategy=LeaveColorUnchanged \ dAutoFilterColorImages=true -dAutoFilterGrayImages=true -dDownsampleMonoImages=true \ -dDownsampleGrayImages=true -dDownsampleColorImages=true -sOutputFile=output.pdf temp.pdfCombinatory logichttps://hugocisneros.com/notes/combinatory_logic/Tue, 19 Apr 2022 16:30:00 +0200https://hugocisneros.com/notes/combinatory_logic/tags Logic papers (Cardone, Hindley 2009) It was independently invented by Moses Schönfinkel, John Von Neumann and Haskell Curry. A Turing-complete basis of operators is: $If\quad\triangleright\quad f$ $Kfg \quad\triangleright\quad f$ $Sfgx \quad\triangleright\quad fx(gx)$ Bibliography Felice Cardone, J. Roger Hindley. 2009. "Lambda-calculus and Combinators in the 20th Century". In Logic from Russell to Church, edited by Dov M. Gabbay and John Woods, 5:723–817. Handbook of the History of Logic. Elsevier. DOI.Artificial lifehttps://hugocisneros.com/notes/artificial_life/Tue, 19 Apr 2022 14:42:00 +0200https://hugocisneros.com/notes/artificial_life/Artificial life could be thought of as attempts at re-creating biological Life or other types of life. It uses different tools such as biology, physics, chemistry, computer science, etc. Creating artificial life seems like a possible way to create AI, since most living systems on Earth seem to exhibit some form of robust intelligent behavior. Definition of artificial life from Carlos Gershenson in (Gershenson 2021) Beginning in the mid-1980s, ALife has studied living systems using a synthetic approach: building life to understand it better (Aguilar et al.Notes on: Evolving a self-repairing, self-regulating, French flag organism by Miller, J. F. (2004)https://hugocisneros.com/notes/millerevolvingselfrepairingselfregulating2004/Tue, 19 Apr 2022 14:36:00 +0200https://hugocisneros.com/notes/millerevolvingselfrepairingselfregulating2004/ tags Cellular automata, Self-organization, Evolutionary strategies source (Miller 2004) Summary Comments Bibliography Julian Francis Miller. 2004. "Evolving a Self-repairing, Self-regulating, French Flag Organism". In Genetic and Evolutionary Computation Conference, 129–39. Springer. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.1049&rep=rep1&type=pdf.Few-shot learninghttps://hugocisneros.com/notes/few_shot_learning/Tue, 19 Apr 2022 13:31:00 +0200https://hugocisneros.com/notes/few_shot_learning/tags Machine learning, Transfer learning resources AI Multiple post Few-shot learning (FSL) can be considered as a kind of meta-learning problem where the model learns how to learn to solve different problems. FSL tasks are referred to as N-way K-shots, where N corresponds to the number of examples in each training classes and K is the number of separate training tasks for the model meta-training. A test time, the model will only see N examples of each of the classes it has to learn.Zero-shot learninghttps://hugocisneros.com/notes/zero_shot_learning/Tue, 19 Apr 2022 13:28:00 +0200https://hugocisneros.com/notes/zero_shot_learning/tags Few-shot learning, Machine learning, Transfer learning Zero-shot learning is a type of learning task where a model has to perform prediction in an output space never seen during training. In the case of image classification, this corresponds to classifying images into classes never seen during training.Computer visionhttps://hugocisneros.com/notes/computer_vision/Tue, 19 Apr 2022 13:24:00 +0200https://hugocisneros.com/notes/computer_vision/ tags Machine learning, Image processing Task Image classification Semantic segmentation Object detection Image generationCryptographyhttps://hugocisneros.com/notes/cryptography/Sun, 17 Apr 2022 13:13:00 +0200https://hugocisneros.com/notes/cryptography/ tags Applied maths, Computer scienceApplied mathshttps://hugocisneros.com/notes/applied_maths/Sun, 17 Apr 2022 13:10:00 +0200https://hugocisneros.com/notes/applied_maths/ tags MathematicsEmergencehttps://hugocisneros.com/notes/emergence/Sun, 17 Apr 2022 13:10:00 +0200https://hugocisneros.com/notes/emergence/ tags Complexity, PhysicsAttractorhttps://hugocisneros.com/notes/attractor/Thu, 14 Apr 2022 20:05:00 +0200https://hugocisneros.com/notes/attractor/ tags Dynamical systems, PhysicsNotes on: Evolution in asynchronous cellular automata by Nehaniv, C. L. (2003)https://hugocisneros.com/notes/nehanivevolutionasynchronouscellular2003/Thu, 14 Apr 2022 19:32:00 +0200https://hugocisneros.com/notes/nehanivevolutionasynchronouscellular2003/tags Cellular automata, Evolution source (Nehaniv 2003) Summary This paper proposes a general asynchronous extension of CA rules and show that they can be made equivalent to the original CA rule. Applying this extension to H. Sayama’s Evoloop cellular automaton (Sayama 1999), the author creates the first asynchronous implementation of evolution of self-replicators. One hope formulated by the author is that asynchronicity could help achieve fault-tolerance and self-repair which is something notoriously difficult to do in CA in general.Emacshttps://hugocisneros.com/notes/emacs/Thu, 14 Apr 2022 18:17:00 +0200https://hugocisneros.com/notes/emacs/tags Coding Emacs is many things, including a general text editor, used for writing code or any other text. Emacs uses ELisp to write configuration code and for scripting. Tips Delete buffers in helm view In helm buffer list view, individual buffers can be selected with C-Space. Once the buffers you want to delete are selected, M-D will delete them and close helm.Christopher Langtonhttps://hugocisneros.com/notes/christopher_langton/Tue, 12 Apr 2022 14:15:00 +0200https://hugocisneros.com/notes/christopher_langton/tags Artificial life, Complex Systems He is a researcher in the field of artificial life and complex systems. He developed many tools and systems that the Alife community still uses today. He studied many interesting properties of cellular automata, including the influence of the $\lambda$ parameter on the chaotic behavior of these CA. A phase transition occurs for certain values of $\lambda$, which was coined the edge of Chaos (Langton 1990).Language modelinghttps://hugocisneros.com/notes/language_modeling/Mon, 11 Apr 2022 15:07:00 +0200https://hugocisneros.com/notes/language_modeling/tags NLP LM with RNNs Different models have been studied, starting from the initial recurrent neural network based language model (Mikolov et al. 2011). Recurrent neural networks LSTM were then used with more success than previous models (Zaremba et al. 2015). Recently, transformers seem to have dominated language modeling. However it is not clear if this is due to their real superiority over RNNs or their practical scalability (Merity 2019).Word vectorshttps://hugocisneros.com/notes/word_vectors/Thu, 07 Apr 2022 19:33:00 +0200https://hugocisneros.com/notes/word_vectors/tags NLP Definition Word vectors are abstract representation of words embedded in a dense space. They are closely related to Language modeling, since the implicit representation a language model builds for prediction can often be used as a word (or sentence) vector. Word vectors can be extracted from the intermediate representations of RNNs or transformers. They can also be created with dedicated algorithms such as Word2Vec. Usage Word vectors can encode interesting information, such as semantic similarity between words.Talk: Alife 2020 keynote Luis Zaman - New Frontiers in Alife: What was old is new againhttps://hugocisneros.com/notes/talk_alife_2020_keynote_luis_zaman_new_frontiers_in_alife_what_was_old_is_new_again/Thu, 07 Apr 2022 19:32:00 +0200https://hugocisneros.com/notes/talk_alife_2020_keynote_luis_zaman_new_frontiers_in_alife_what_was_old_is_new_again/tags Artificial life, ALife 2020 This keynote is about this sub-community of ALife which is dedicated to constructing actual artificial systems that can exhibit Open-ended Evolution, and Life-like behavior. The first models that tried to construct ALife were probably cellular automata: Von Neumann’s self-reproducing CA and Langton’s loop. However, their main limitation was they were extremely brittle, which is why evolution did not really work in them. Zaman’s definition of evolution isNeural networkshttps://hugocisneros.com/notes/neural_networks/Wed, 06 Apr 2022 13:48:00 +0200https://hugocisneros.com/notes/neural_networks/tags Machine learning Two-layers neural network Mathematically, a simple two-layers neural network with relu non-linearities can be written like below. For an input vector $x \in \mathbb{R}^D$, $\mathbf{a} = (a_1, \cdots, a_N)\in \mathbb{R}^M$ are the output weights, $\mathbf{b} = (b_1, \cdots, b_N)\in \mathbb{R}^D$ are the input weights \[ h(x) = \frac{1}{m} \sum_{i=1}^m a_i \max\{ b_i^\top x,0\}, \] Universal approximation theorem Cybenko showed in 1989 that a neural network of arbitrary width with sigmoid activation function could approximate any continuous function (Cybenko 1989).Org-roamhttps://hugocisneros.com/notes/org_roam/Wed, 06 Apr 2022 13:30:00 +0200https://hugocisneros.com/notes/org_roam/tags Emacs, Org-mode, Writing website Org-roam Org-roam is an org-mode package that implements features similar to Roam research: Bi-directional links between notes — a link from a note to another note is also a “backlink” from the latter to the former. Possibility to reference whole notes or subtrees transparently Citation and references as links And many others It was created by Jethro Kuan. Org roam is useful to manage a personal knowledge base in plain text, while taking advantage of the powerful features of Org-mode.Rusthttps://hugocisneros.com/notes/rust/Wed, 06 Apr 2022 13:28:00 +0200https://hugocisneros.com/notes/rust/ tags Programming languages, Coding Interesting repos The Rust Python interpreter: linkWritinghttps://hugocisneros.com/notes/writing/Wed, 06 Apr 2022 12:59:00 +0200https://hugocisneros.com/notes/writing/Kuznets curvehttps://hugocisneros.com/notes/kuznets_curve/Tue, 05 Apr 2022 15:52:00 +0200https://hugocisneros.com/notes/kuznets_curve/tags Climate, Economics From (Henriques, Böhm 2022): This is based on the work of the U.S. American economist Simon Kuznets who, in the 1950s and 1960s, argued that social inequalities first increase with rising economic growth before decreasing over time (Kuznets’ 1955). Kuznets curve for pollution and climate science Bibliography Irene Henriques, Steffen Böhm. May 2022. "The Perils of Ecologically Unequal Exchange: Contesting Rare-earth Mining in Greenland". Journal of Cleaner Production 349 (May):131378.Dirichlet distributionhttps://hugocisneros.com/notes/dirichlet_distribution/Thu, 31 Mar 2022 12:39:00 +0200https://hugocisneros.com/notes/dirichlet_distribution/ tags ProbabilitiesProbabilitieshttps://hugocisneros.com/notes/probabilities/Thu, 31 Mar 2022 12:39:00 +0200https://hugocisneros.com/notes/probabilities/ tags MathematicsReservoir computinghttps://hugocisneros.com/notes/reservoir_computing/Tue, 15 Mar 2022 11:41:00 +0100https://hugocisneros.com/notes/reservoir_computing/tags Machine learning, Unconventional computing, Unsupervised learning Reservoir computing is a term used to describe a class of machine learning algorithms that rely on transient dynamics of a dynamical system to implement and manipulate goal-related information. The most famous example is echo-state networks, which uses random recurrent neural networks as reservoirs, but other dynamical systems can also be used. Reservoir computing with cellular automata Reservoir computing can use cellular automata as the reservoir.Unconventional computinghttps://hugocisneros.com/notes/unconventional_computing/Tue, 15 Mar 2022 11:36:00 +0100https://hugocisneros.com/notes/unconventional_computing/papers (Stepney 2012) Molecular computing (Kompa, Levine 2001) DNA computing (Zhang, Ye 2012) Light computing (Pittman et al. 2003) Computing in physical material Computing (Fang et al. 2016) (Stern et al. 2021) Storing and encoding memory (Pashine et al. 2019) (Chen et al. 2021) Bibliography Susan Stepney. 2012. "Nonclassical Computation — A Dynamical Systems Perspective". In Handbook of Natural Computing, edited by Grzegorz Rozenberg, Thomas Bäck, and Joost N. Kok, 1979–2025.Biological lifehttps://hugocisneros.com/notes/life/Tue, 15 Mar 2022 11:21:00 +0100https://hugocisneros.com/notes/life/This note refers to the only biological life we’ve observed so far: the one on Earth. From Sara Walker’s keynote at Alife 2020: The Natural History of Information: Life is a process whereby information structures matter across space and time. Why should life be an emergent process? An answer from (Krakauer et al. 2020): The fact that physics and chemistry are universal—ongoing in stars, solar systems, and galaxies—whereas to the best of our knowledge biology is exclusively a property of earth, supports the view that life is emergent.Elementary cellular automatahttps://hugocisneros.com/notes/elementary_cellular_automata/Tue, 15 Mar 2022 11:02:00 +0100https://hugocisneros.com/notes/elementary_cellular_automata/tags Cellular automata resources Wolfram Mathworld, Wikipedia ECA are one of the simplest form of 1D cellular automata possible. The grid is a 1-dimensional array of cells in state 0 or 1 (dead or alive). The size of the neighborhood being used for the update is 3 (one cell to the left, the main cell and one cell to the right). Each of those $2^3$ neighborhoods of size 3 can be mapped to either state 1 or 0.Gradient descenthttps://hugocisneros.com/notes/gradient_descent/Mon, 07 Mar 2022 16:57:00 +0100https://hugocisneros.com/notes/gradient_descent/tags Optimization, Algorithm resources Slides by Christian S. Perone Fixed learning rate The simplest way to apply the gradient descent algorithm on a function $g$ convex and $L-$smooth on $\mathbb{R}^d$ is to use the parameter update: \[ \theta_t = \theta_{t-1} - \gamma g’(\theta_{t-1}) \] This is based on the standard first-order approximation of the function $g$. It can be very sensitive to the learning rate and suffer from pathological curvature.Rademacher complexityhttps://hugocisneros.com/notes/rademacher_complexity/Mon, 07 Mar 2022 16:44:00 +0100https://hugocisneros.com/notes/rademacher_complexity/tags Machine learning Definition Given a function class $f_w$ and random iid $y_\mu \in \{\pm 1\}$, the Rademacher complexity is \[ \mathscr{R}_n = \mathbb{E}_{y, X }\text{sup}_w \frac{1}{n} \sum_{\mu = 1}^n y_\mu f_w(X_\mu) \] It measures how well a function can approximate a dataset with random labels. (Bartlett, Mendelson 2002) shows bounds for the Rademacher complexity in terms of $\ell_1$ norm bounds on the weights of the network. However, (Zhang et al.Compressed sensinghttps://hugocisneros.com/notes/compressed_sensing/Mon, 07 Mar 2022 16:07:00 +0100https://hugocisneros.com/notes/compressed_sensing/tags Signal processing resources (Candes et al. 2006) Description Compressed sensing is a technique to recover a sparse signal from partial observations. The signal is described as a $N$-dimensional vector $\textbf{s}$. We make $M$ measurements, where a measurements means a projection of the signal $\textbf{s}$ onto some known vector. The result of all these measurements can be written as $\textbf{y} = \textbf{Fs}$, where $\textbf{F}$ is a $M \times N$ matrix.Double descenthttps://hugocisneros.com/notes/double_descent/Mon, 07 Mar 2022 15:59:00 +0100https://hugocisneros.com/notes/double_descent/tags Neural network training, Neural networks resources (Belkin et al. 2019) Double descent is a phenomenon usually observed in neural networks, where the usual bias-variance tradeoff seems to break down: test error keeps decreasing as we over-parametrize the network or add more training examples. This was observed for over-parametrized neural networks in (Geman et al. 1992). An illustration from (caption is also adapted from the paper) (Belkin et al. 2019):Bias-variance tradeoffhttps://hugocisneros.com/notes/bias_variance_tradeoff/Mon, 07 Mar 2022 15:56:00 +0100https://hugocisneros.com/notes/bias_variance_tradeoff/ tags Machine learning, StatisticsLevenshtein automatahttps://hugocisneros.com/notes/levenshtein_automata/Mon, 07 Mar 2022 09:17:00 +0100https://hugocisneros.com/notes/levenshtein_automata/tags Algorithm, Finite state machines resources Nick’s blog This is an algorithm used to find strings within a given Levenshtein distance of a target word.Levenshtein distancehttps://hugocisneros.com/notes/levenshtein_distance/Mon, 07 Mar 2022 09:15:00 +0100https://hugocisneros.com/notes/levenshtein_distance/ tags Algorithm, Natural language processingNLPhttps://hugocisneros.com/notes/nlp/Mon, 07 Mar 2022 09:15:00 +0100https://hugocisneros.com/notes/nlp/ tags Machine learning, Language NLP is about creating algorithms that can manipulate and use language. It is often thought that having functioning NLP algorithms that provably “understand” language would be equivalent to reaching human-level Artificial Intelligence. Tasks Language modeling Text classification Question answering Data manipulation There are several ways to encode text data. One-hot encoding of characters One-hot encoding of words Byte-pair encoding which can be seen as being a compromise between the twoFinite state machineshttps://hugocisneros.com/notes/finite_state_machines/Mon, 07 Mar 2022 09:05:00 +0100https://hugocisneros.com/notes/finite_state_machines/ tags Theory of computation Finite automata Finite state transducers Implementations States machines in Rust: link. In Python: python-statemachineNotes on: Transformer Memory as a Differentiable Search Index by Tay, Y., Tran, V. Q., Dehghani, M., Ni, J., Bahri, D., Mehta, H., Qin, Z., … (2022)https://hugocisneros.com/notes/taytransformermemorydifferentiable2022/Thu, 17 Feb 2022 15:46:00 +0100https://hugocisneros.com/notes/taytransformermemorydifferentiable2022/ source (Tay et al. 2022) Summary Comments Bibliography Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, et al.. February 16, 2022. "Transformer Memory as a Differentiable Search Index". Arxiv:2202.06991 [cs]. http://arxiv.org/abs/2202.06991.Notes on: One model for the learning of language by Yang, Y., & Piantadosi, S. T. (2022)https://hugocisneros.com/notes/yangonemodellearning2022/Mon, 31 Jan 2022 13:55:00 +0100https://hugocisneros.com/notes/yangonemodellearning2022/source (Yang, Piantadosi 2022) tags NLP, Artificial Intelligence, Machine learning Summary This paper introduces a model for learning language from few examples while generalizing effectively. This model builds sentences with function that are combinations of elementary functions, including: pair(L, C) : Concatenates character C onto list L first(L) : Return the first character of L flip(P) : Return true with probability P if(B, X, Y) : Return X if B else return Y (X and Y may be lists, sets, or probabilities) etc.Notes on: Next Generation Reservoir Computing by Gauthier, D. J., Bollt, E., Griffith, A., & Barbosa, W. A. S. (2021)https://hugocisneros.com/notes/gauthiernextgenerationreservoir2021/Sat, 29 Jan 2022 14:51:00 +0100https://hugocisneros.com/notes/gauthiernextgenerationreservoir2021/source (Gauthier et al. 2021) tags Reservoir computing Summary This paper bases itself on the demonstration that some reservoir computers (echo-state networks) are mathematically identical to nonlinear vector autoregression (NVAR) machines (Bollt 2021). A NVAR is just a regression over a feature vector composed of $k$ time-delay observations of the dynamical system to be learned and nonlinear functions of these observations. The authors introduce Next-Generation Reservoir computing (NG-RC) which is essentially a NVAR.Notes on: Formal Definitions of Unbounded Evolution and Innovation Reveal Universal Mechanisms for Open-Ended Evolution in Dynamical Systems by Adams, A., Zenil, H., Davies, P. C. W., & Walker, S. I. (2017)https://hugocisneros.com/notes/adamsformaldefinitionsunbounded2017/Thu, 27 Jan 2022 12:05:00 +0100https://hugocisneros.com/notes/adamsformaldefinitionsunbounded2017/source (Adams et al. 2017) tags Open-ended Evolution, Cellular automata Summary This paper defines two properties for dynamical systems which are claimed to be related to open-ended evolution: Unbounded evolution (UE) and Innovation (INN). The combination of these two properties makes a system open-ended according to this paper’s definition For such properties to be possible, a system has to be decomposed into two entities that interact with each other:Moore-Penrose inversehttps://hugocisneros.com/notes/moore_penrose_inverse/Thu, 27 Jan 2022 09:53:00 +0100https://hugocisneros.com/notes/moore_penrose_inverse/tags Mathematics The MP inverse exists and is unique for any matrix $A$. When $A$ has linearly independent columns, the MP inverse $A^+$ is \[ A^+ = (A^* A)^{-1 } A^* \] Where $A^*$ is the conjugate transpose of $A$. If the rows are linearly independent, \[ A^+ = A^* (AA^*)^{-1} \].Jethro Kuanhttps://hugocisneros.com/notes/jethro_kuan/Wed, 26 Jan 2022 17:56:00 +0100https://hugocisneros.com/notes/jethro_kuan/website Personal website Jethro Kuan is a developer from Singapore. He developed Org-roam.Generative adversarial networkshttps://hugocisneros.com/notes/generative_adversarial_networks/Wed, 19 Jan 2022 12:14:00 +0100https://hugocisneros.com/notes/generative_adversarial_networks/tags Neural networks, Generative modelling Generative adversarial networks are a type of generative model. It is close in spirit to Variational autoencoders, but has key differences. The main one is the way the model is trained, which uses an adversarial equilibrium between training a generator and training a discriminator. Are GANs glorified PCA? (Richardson, Weiss 2020) This paper seems to show that image-to-image translation models are ill-posed and imply the image transformation should always be very local.ALife Conferencehttps://hugocisneros.com/notes/alife_conference/Wed, 19 Jan 2022 12:13:00 +0100https://hugocisneros.com/notes/alife_conference/ tags Artificial life The ALife conference is about artificial life in its many forms. This includes topics like cellular automata, complex Systems, biological life, etc. ALife 2020Codinghttps://hugocisneros.com/notes/coding/Wed, 19 Jan 2022 12:11:00 +0100https://hugocisneros.com/notes/coding/tags Computer science, Programming languages For compiled programming languages, code has to be translated to machine code through a process called compilationGraph cellular automatahttps://hugocisneros.com/notes/graph_cellular_automata/Mon, 17 Jan 2022 17:08:00 +0100https://hugocisneros.com/notes/graph_cellular_automata/tags Cellular automata, Graphs This concept was mentioned in (O’Sullivan 2001), although it may not be the first ever mention of it. The idea is also similar to graph convolutional networks and other graph neural networks, where the goal is to construct an update function for a node that doesn’t depend on the number of neighboring nodes. This enables running cellular automata on non-grid structures. This idea was recently published at NeurIPS 2021 (Grattarola et al.Cellular neural networkshttps://hugocisneros.com/notes/cellular_neural_networks/Mon, 17 Jan 2022 16:44:00 +0100https://hugocisneros.com/notes/cellular_neural_networks/ tags Cellular automata, Neural networks resources Scholarpedia, (Chua, Yang 1988; Chua, Yang 1988) Bibliography L.O. Chua, L. Yang. October 1988. "Cellular Neural Networks: Applications". IEEE Transactions on Circuits and Systems 35 (10):1273–90. DOI. L.O. Chua, L. Yang. October 1988. "Cellular Neural Networks: Theory". IEEE Transactions on Circuits and Systems 35 (10):1257–72. DOI.Intrinsic motivationhttps://hugocisneros.com/notes/intrinsic_motivation/Mon, 17 Jan 2022 13:44:00 +0100https://hugocisneros.com/notes/intrinsic_motivation/tags Reinforcement learning, Robotics references : According to (Ryan, Deci 2000) (pp. 56), Intrinsic motivation is defined as the doing of an activity for its inherent satisfaction rather than for some separable consequence. When intrinsically motivated, a person is moved to act for the fun or challenge entailed rather than because of external products, pressures, or rewards. It is defined by contrast with extrinsic motivation Extrinsic motivation is a construct that pertains whenever an activity is done in order to attain some separable outcome.Supervised learninghttps://hugocisneros.com/notes/supervised_learning/Thu, 04 Nov 2021 14:23:00 +0100https://hugocisneros.com/notes/supervised_learning/tags Machine learning Data Input/output example pairs: \[ \{(x_i, y_i)\}_{i\leq n} \sim_{iid} \mathbb{P}, \quad \mathbb{P} \in \mathcal{P}(\mathcal{X} \times \mathcal{Y}) \text{ unknown} \] Mapping We search for a mapping $f: \mathcal{X} \rightarrow \mathcal{Y}$. It is also common to parameterize this mapping with a parameter $\theta \in \mathbb{R}^d$ and write $h: \mathcal{X} \times \mathbb{R}^d \rightarrow \mathcal{Y}$. The prediction $\hat{y}$ is written \[ \hat{y} = f(x) = h(x, \theta) \] Objective The goal is to find the above mapping such as to minimize an objective.Catastrophic forgettinghttps://hugocisneros.com/notes/catastrophic_forgetting/Mon, 18 Oct 2021 09:53:00 +0200https://hugocisneros.com/notes/catastrophic_forgetting/tags Machine learning Catastrophic forgetting is the name given to a common problem of machine learning models: when training on some new data from a new distribution (a new “task”), many models forget what they learned from the first task. This isn’t surprising since models are following a loss function that is often applied solely on the task at hand, and not constraining the model to retain past information.Conway's Game of Lifehttps://hugocisneros.com/notes/conway_s_game_of_life/Mon, 18 Oct 2021 08:56:00 +0200https://hugocisneros.com/notes/conway_s_game_of_life/tags Cellular automata resources (Gardner 1970) It is one of the most famous Cellular automata rule, invented as a game by mathematician John Conway. Learn the game of life with a neural network This paper investigates how hard it is for neural networks to approximate the Game of Life rule (Springer, Kenyon 2020). Bibliography Martin Gardner. October 1970. "Mathematical Games". Scientific American 223 (4):120–23. DOI. Jacob M. Springer, Garrett T.Variational autoencodershttps://hugocisneros.com/notes/variational_autoencoders/Thu, 07 Oct 2021 13:37:00 +0200https://hugocisneros.com/notes/variational_autoencoders/tags Neural networks resources (Bishop 1994) Variational autoencoders (VAEs) are a type of generative Autoencoders. They use a Bayesian latent encoding for the input dataset. VAEs vs. GANs VAEs have fallen out of fashion when GANs became popular, because they were able to get visually interesting results more easily. However, some works a few years later seem to show that they have similar potential (Vahdat, Kautz 2020). Bibliography Christopher M.Compressionhttps://hugocisneros.com/notes/compression/Tue, 05 Oct 2021 17:52:00 +0200https://hugocisneros.com/notes/compression/Compression with Neural networks Compression can be done with the help of neural networks as estimators of the sequence’s next character probability (Schmidhuber, Heil 1996). Compression as a measure of Artificial Intelligence Mahoney argues in (Mahoney 1999) that being able to compress information amounts to being able to predict optimally the distribution of an inputs natural language corpus. A good compression algorithm “learns” features of the language to make better predictions.Graph compressionhttps://hugocisneros.com/notes/graph_compression/Thu, 30 Sep 2021 14:47:00 +0200https://hugocisneros.com/notes/graph_compression/ tags Compression, Graphs (Bouritsas et al. 2021) Bibliography Giorgos Bouritsas, Andreas Loukas, Nikolaos Karalias, Michael M. Bronstein. July 5, 2021. "Partition and Code: Learning How to Compress Graphs". Arxiv:2107.01952 [cs, Math, Stat]. http://arxiv.org/abs/2107.01952.Neural networks as dynamical systemshttps://hugocisneros.com/notes/neural_networks_as_dynamical_systems/Thu, 30 Sep 2021 14:46:00 +0200https://hugocisneros.com/notes/neural_networks_as_dynamical_systems/tags Neural networks, Dynamical systems Neural networks can be seen as dynamical systems in different contexts. Recurrent networks With Recurrent neural networks, the continuous dynamical system analogy is very striking. These networks evolve progressively in time by updating an internal state with a fixed algorithm. Usually the state dynamics are not studied because the recurrent networks is designed to complete some fixed task. The notion of attractor can be defined for such networks, making them related to the notion of attractor networks.Learning in dynamical systemshttps://hugocisneros.com/notes/learning_in_dynamical_systems/Thu, 30 Sep 2021 12:56:00 +0200https://hugocisneros.com/notes/learning_in_dynamical_systems/ tags Machine learning, Dynamical systems resources (Weinan 2017) Bibliography E Weinan. March 2017. "A Proposal on Machine Learning via Dynamical Systems". Communications in Mathematics and Statistics 5 (1):1–11. DOI.Tropical semiringhttps://hugocisneros.com/notes/tropical_semiring/Thu, 30 Sep 2021 12:36:00 +0200https://hugocisneros.com/notes/tropical_semiring/tags Mathematics Definition The min tropical semiring is the semiring $(\mathbb{R} \cup \{ +\infty \}, \oplus, \otimes )$ with the operations: $x \oplus y = \min(x, y)$ $x \otimes y = x + y$ The unit for $\oplus$ is $+\infty$ and the unit for $\otimes$ is $0$. The max tropical semiring is defined similarly by replacing $\min$ with $\max$. Relation with shortest path algorithms There is an interesting connection between the min tropical semiring and Dijkstra’s algorithm.Arithmetic codinghttps://hugocisneros.com/notes/arithmetic_coding/Thu, 23 Sep 2021 09:07:00 +0200https://hugocisneros.com/notes/arithmetic_coding/ tags Compression, Entropy coding resources (Witten et al. 1987) Bibliography Ian H. Witten, Radford M. Neal, John G. Cleary. 1987. "Arithmetic Coding for Data Compression". Communications of the ACM 30 (6). ACM New York, NY, USA:520–40.Blue noisehttps://hugocisneros.com/notes/blue_noise/Thu, 23 Sep 2021 09:07:00 +0200https://hugocisneros.com/notes/blue_noise/ tags Noise resources (Wong, Wong 2017) Bibliography Kin-Ming Wong, Tien-Tsin Wong. June 1, 2017. "Blue Noise Sampling Using an N-body Simulation-based Method". The Visual Computer 33 (6):823–32. DOI.Entropy codinghttps://hugocisneros.com/notes/entropy_coding/Thu, 23 Sep 2021 08:50:00 +0200https://hugocisneros.com/notes/entropy_coding/ tags Compression, EntropyPoincaré recurrence timehttps://hugocisneros.com/notes/poincare_recurrence_time/Tue, 14 Sep 2021 22:50:00 +0200https://hugocisneros.com/notes/poincare_recurrence_time/tags Dynamical systems The Poincaré recurrence time for a finite dynamical system is the maximal theoretical time after which the system will return to its initial state and the trajectory will repeat. In the case of a cellular automaton on a grid of size $n$ with $k$ possible states per cell, the recurrence time is $t_P = k^n$.Monerohttps://hugocisneros.com/notes/monero/Tue, 14 Sep 2021 22:05:00 +0200https://hugocisneros.com/notes/monero/ tags Cryptography resources MoneroRing signatureshttps://hugocisneros.com/notes/ring_signatures/Tue, 14 Sep 2021 22:01:00 +0200https://hugocisneros.com/notes/ring_signatures/tags Cryptography A ring signature is a protocol that allows a single entity from a group or public-private key pairs to sign a message in such a way that anyone can check the signature was indeed made by someone from that group but it is impossible to tell who exactly. It should also be very hard to create a fake signature without knowing any of the private keys from the group.Cellular automata as recurrent neural networkshttps://hugocisneros.com/notes/cellular_automata_as_recurrent_neural_networks/Mon, 06 Sep 2021 16:06:00 +0200https://hugocisneros.com/notes/cellular_automata_as_recurrent_neural_networks/tags Cellular automata, Recurrent neural networks Since cellular automata are a kind of dynamical system, they may also be seen as a type of recurrent neural network. They can even be seen as a recurrent-convolutional network because each of the “hidden neurons” update depends only on the neighboring neurons and the update rule is shared across the whole hidden state.Reductionismhttps://hugocisneros.com/notes/reductionism/Fri, 03 Sep 2021 15:14:00 +0200https://hugocisneros.com/notes/reductionism/tags Philosophy, Physics Reductionism is a philosophy according to which the laws of physics are relatively simple and could be expressed concisely. If we were to understand and model all those laws we could explain any given phenomenon by breaking it down into its smaller parts until we just need to apply these simple laws. With the current state of physics, it seems we are close to understanding many of these fundamental laws (although major shortcomings remain in the theory).Notes on: Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention by Xiong, Y., Zeng, Z., Chakraborty, R., Tan, M., Fung, G., Li, Y., & Singh, V. (2021)https://hugocisneros.com/notes/xiongnystromformernystr2021/Thu, 02 Sep 2021 12:52:00 +0200https://hugocisneros.com/notes/xiongnystromformernystr2021/tags Transformers source (Xiong et al. 2021) Summary This paper describes a way of applying the Nyström method for approximating matrix multiplication to transformers. More precisely, the approximation is used in the self-attention mechanism’s softmax calculation. This approximation adresses one of the biggest downside of attention: its computational complexity. The authors claim that their method reduces it from $O(n^2)$ to $O(n)$. The goal of the method is to efficiently approximate the matrixGraph neural networkshttps://hugocisneros.com/notes/graph_neural_networks/Thu, 02 Sep 2021 12:49:00 +0200https://hugocisneros.com/notes/graph_neural_networks/tags Neural networks, Graphs Basic properties To operate on graphs, a neural network must be invariant to isomorphism of these graphs. This translates to permutation invariance for the nodes of a graph. \[ f(\mathbf{PX}) = f(\mathbf{X}) \] Where $\mathbf{P}$ is a permutation matrix. For simple sets, this amounts to performing node-wise transformations and use a permutation invariant aggregator (sum/max/avg/…). This was done in (Zaheer et al. 2018). \[ f(\mathbf{X}) = \phi\left( \bigoplus_i \psi(\mathbf{x}_i) \right) \]Notes on: The information theory of individuality by Krakauer, D., Bertschinger, N., Olbrich, E., Flack, J. C., & Ay, N. (2020)https://hugocisneros.com/notes/krakauerinformationtheoryindividuality2020/Thu, 02 Sep 2021 12:46:00 +0200https://hugocisneros.com/notes/krakauerinformationtheoryindividuality2020/tags Information theory, Life source (Krakauer et al. 2020) Summary This paper introduces an information theoretic definition of individuality for complex systems. In a few words, the authors idea of individuality is based on the amount of information transmitted through time. If the information transmitted forward in time is close to maximal, we take that as evidence for individuality. Formally, a system $\mathcal{S}$ is considered in interaction with an environment $\mathcal{E}$.Notes on: Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention by Katharopoulos, A., Vyas, A., Pappas, N., & Fleuret, F. (2020)https://hugocisneros.com/notes/katharopoulostransformersarernns2020/Thu, 02 Sep 2021 12:07:00 +0200https://hugocisneros.com/notes/katharopoulostransformersarernns2020/tags Transformers, RNN source (Katharopoulos et al. 2020) Summary Transformers have traditionally been described as different models from RNNs. This is because instead of processing the sequence one token at a time, Transformers use attention to process all elements simultaneously. The paper introduces an interesting new formulation, replacing the softmax attention with a feature map-based dot product. This new formulation yields better time and memory complexity as well as a model that is casual and autoregressive (similar to RNNs).Notes on: Growing Neural Cellular Automata by Mordvintsev, A., Randazzo, E., Niklasson, E., & Levin, M. (2020)https://hugocisneros.com/notes/mordvintsevgrowingneuralcellular2020/Wed, 01 Sep 2021 17:31:00 +0200https://hugocisneros.com/notes/mordvintsevgrowingneuralcellular2020/tags Cellular automata source (Mordvintsev et al. 2020) Summary This paper introduces interesting ideas for training cellular automata as CNNs to have self-repairing stable structures. The automata have 16 dimensional continuous states. The main modeling ideas are: Use hard-coded filters for the initial perception step. The filters are Sobel convolutions and those two are concatenated with the current state. Update rules are then 1D convolutions applied to the $3 * 16 = 48$ dimensional state vector.Cellular automata as convolutional neural networkshttps://hugocisneros.com/notes/cellular_automata_as_cnns/Wed, 01 Sep 2021 17:30:00 +0200https://hugocisneros.com/notes/cellular_automata_as_cnns/tags Cellular automata, Convolutional neural networks Motivation Cellular automata are computational models based on several principles such as translation-invariance and parallelism of computations. These principles also motivated the creation of Convolutional neural networks — used initially for images and text —, making this model well-suited to reason about cellular automata. There is indeed a deep connection between the two models, making it seem like there are two expressions of the same idea: spatially organized information can be processed locally in parallel.Computing in cellular automatahttps://hugocisneros.com/notes/computing_in_cellular_automata/Wed, 01 Sep 2021 17:30:00 +0200https://hugocisneros.com/notes/computing_in_cellular_automata/tags Unconventional computing, Cellular automata resources (Mitchell 2005; Wolfram 2002) Cellular automata are computational models capable of interesting emergent behavior. A major challenge is to understand which CA rules are doing useful or efficient computations. It is not clear how these systems could be programmed or made to compute a particular function. Hand-engineered CA rules Below images show CA rules that can compute non trivial functions (Images are from (Wolfram 2002), see A new kind of science online ).Epsilon machineshttps://hugocisneros.com/notes/epsilon_machines/Wed, 01 Sep 2021 17:30:00 +0200https://hugocisneros.com/notes/epsilon_machines/(Crutchfield, Young 1989) Bibliography James P. Crutchfield, Karl Young. July 10, 1989. "Inferring Statistical Complexity". Physical Review Letters 63 (2):105–8. DOI.Style transferhttps://hugocisneros.com/notes/style_transfer/Wed, 01 Sep 2021 17:30:00 +0200https://hugocisneros.com/notes/style_transfer/tags Computer vision Style transfer is the process of transferring some visual features from one image to another image while preserving the latter’s content information. Since both these notions may be considered subjective, the problem of style transfer is not well defined and may be approached in many ways. Style transfer with CNNs This is an early example of style transfer with convolutional neural networks: (Gatys et al. 2016)Notes on: Reservoir Computing meets Recurrent Kernels and Structured Transforms by Dong, J., Ohana, R., Rafayelyan, M., & Krzakala, F. (2020)https://hugocisneros.com/notes/dongreservoircomputingmeets2020/Mon, 30 Aug 2021 22:02:00 +0200https://hugocisneros.com/notes/dongreservoircomputingmeets2020/source (Dong et al. 2020) tags Reservoir computing, Kernel Methods Summary This paper presents a connection between large size reservoir computing and kernel methods. The authors formulate a reservoir computing model as a form of recurrent kernel iteration. If the reservoir update is written \[ x^{(t+1)} = \dfrac{1}{\sqrt{N}} f \left(W_r x^{(t)} + W_i i^{(t)} \right) \] with $x^{(t)}$ the state of the reservoir at time $t$ and $i^{(t)}$ sequential input at time $t$, $W_r \in \mathbb{R}^{N\times N}$ and $W_i \in \mathbb{R}^{N\times d}$, we may re-frame it as a random feature embedding of the vector $\left[ x^{(t)} , i^{(t)} \right]$ with the matrix $W = [W_r, W_i]$.Differential equationshttps://hugocisneros.com/notes/differential_equations/Mon, 30 Aug 2021 17:12:00 +0200https://hugocisneros.com/notes/differential_equations/ tags MathematicsSoftmaxhttps://hugocisneros.com/notes/softmax/Fri, 27 Aug 2021 15:41:00 +0200https://hugocisneros.com/notes/softmax/ tags Applied maths The Softmax can refer to two mathematical functions: In machine learning a softmax is the function which normalizes a vector of values to a probability vector: $\text{softmax}(\mathbf{x}) = \dfrac{e^{\mathbf{x}}}{\sum_i e^{x_i}}$ where $\mathbf{x} = (x_i) \in \mathbb{R}^n$. This function could also be called soft-argmax because it is a smooth approximation of the discrete argmax function. It may also refer to a smoothed maximum function like $\epsilon \log \sum_i \exp (x_i / \epsilon)$ which approximates the $\text{max}$ function in the limit $\epsilon \rightarrow 0$Notes on: Learned Initializations for Optimizing Coordinate-Based Neural Representations by Tancik, M., Mildenhall, B., Wang, T., Schmidt, D., Srinivasan, P. P., Barron, J. T., & Ng, R. (2021)https://hugocisneros.com/notes/tanciklearnedinitializationsoptimizing2021/Wed, 25 Aug 2021 16:39:00 +0200https://hugocisneros.com/notes/tanciklearnedinitializationsoptimizing2021/tags Neural radiance fields, Meta-learning source (Tancik et al. 2021) Summary This paper explores meta-learning techniques for improving the quality and speed of convergence of learned implicit neural representations. The authors use meta-learning to optimize the initial weights $\theta_0$ of the neural networks such that it minimizes the loss $L(\theta_m)$ when the network is optimized on a new unseen observations. As a meta-learning problem, there is an inner loop and an outer loop:Meta-learninghttps://hugocisneros.com/notes/meta_learning/Wed, 25 Aug 2021 16:00:00 +0200https://hugocisneros.com/notes/meta_learning/tags Machine learning Constrained meta-learning (Kirsch, Schmidhuber 2021) Meta-learning of initialization The goal is to learn the initialization of neural network parameters or recurrent neural network initial states in order to make the training faster or less prone to getting stuck in local minima. Example for implicit neural representations: (Tancik et al. 2021) Meta-learning algorithms MAML (Finn et al. 2017) Reptile (Nichol et al. 2018) Bibliography Louis Kirsch, Jürgen Schmidhuber.Notes on: Pretrained Transformers as Universal Computation Engines by Lu, K., Grover, A., Abbeel, P., & Mordatch, I. (2021)https://hugocisneros.com/notes/lupretrainedtransformersuniversal2021/Wed, 25 Aug 2021 15:19:00 +0200https://hugocisneros.com/notes/lupretrainedtransformersuniversal2021/source (Lu et al. 2021) tags Transformers Summary Different types of neural network architecture encode different kinds of biases. For example, convolutional neural networks perform local, translation-invariant operations and recurrent neural networks operate on sequential data. One can use these biases in randomly initialized networks as a basis for interesting computations. This is on of the motivation for reservoir computing with echo-state networks, which uses fixed random recurrent neural network and a simple trainable linear transformation to perform complex computations.Notes on: Thinking Like Transformers by Weiss, G., Goldberg, Y., & Yahav, E. (2021)https://hugocisneros.com/notes/weissthinkingtransformers2021/Wed, 25 Aug 2021 15:19:00 +0200https://hugocisneros.com/notes/weissthinkingtransformers2021/tags NLP, Computer science source (Weiss et al. 2021) Summary This paper introduces a programming language that is inspired by the way Transformers process input data. The language is called Restricted Access Sequence Processing Language (RASP). Data is represented as sequences, which is the structure transformers manipulate (since they have been designed for NLP applications). The language has two types of internal data representation: Sequence operators (s-ops) are functions that translate sequences into sequences.System of linear equationshttps://hugocisneros.com/notes/system_of_linear_equations/Tue, 24 Aug 2021 15:04:00 +0200https://hugocisneros.com/notes/system_of_linear_equations/tags Applied maths Such a system with $m$ equations and $n$ unknowns is often denoted $Ax = b$ where $A$ is a matrix $m\times n$ and $b$ is a vector of size $m$. There are multiple methods to solve such a systems with different sets of hypotheses. System types Square matrix with full rank In the most simple case: a square matrix with full rank, the solution exists and is unique: $x = A^{-1} b$Knuth-Morris-Pratt string-searching algorithmhttps://hugocisneros.com/notes/knuth_morris_pratt_string_searching_algorithm/Tue, 24 Aug 2021 11:10:00 +0200https://hugocisneros.com/notes/knuth_morris_pratt_string_searching_algorithm/ tags Algorithm, Computer science resources Yurichev.comNeural cellular automata and implicit representationshttps://hugocisneros.com/notes/neural_cellular_automata_and_implicit_representations/Fri, 20 Aug 2021 15:30:00 +0200https://hugocisneros.com/notes/neural_cellular_automata_and_implicit_representations/ tags Cellular automata, Neural cellular automataNotes on: Emergence in artificial life by Gershenson, C. (2021)https://hugocisneros.com/notes/gershensonemergenceartificiallife2021/Tue, 03 Aug 2021 10:45:00 +0200https://hugocisneros.com/notes/gershensonemergenceartificiallife2021/tags Artificial life, Emergence source (Gershenson 2021) Summary The paper introduces a complexity metric based on information. emergence is first measured with Shannon’s information: \[E = - K \sum_{i} p_i \log p_i\] Then the author argues that self-organization can be seen as the opposite of emergence, and measured with \[S = 1 - E\] […] complex systems tend to exhibit both emergence and self-organization. Extreme emergence implies chaos, while extreme self-organization implies immutability.Locality-Sensitive Hashinghttps://hugocisneros.com/notes/locality_sensitive_hashing/Thu, 24 Jun 2021 08:45:00 +0200https://hugocisneros.com/notes/locality_sensitive_hashing/ tags Computer science resources Tyler Neylon’s blogGenerative modellinghttps://hugocisneros.com/notes/generative_modelling/Thu, 17 Jun 2021 10:18:00 +0200https://hugocisneros.com/notes/generative_modelling/ tags Machine learningZuse's thesishttps://hugocisneros.com/notes/zuse_s_thesis/Tue, 15 Jun 2021 09:58:00 +0200https://hugocisneros.com/notes/zuse_s_thesis/tags Physics, Philosophy resources Juergen Schmidhuber’s page, (Schmidhuber 1999) Zuse’s thesis is the idea that the Universe could be running within a digital computer. It was formulated by Konrad Zuse in Rechnender Raum (Calculating Space) in 1969. The computer could be a very large Cellular automaton according to Zuse. A computer program to simulate our Universe (and all the others) Systematically create and execute all programs for a universal computer, such as a Turing machine or a CA; the first program is run for one instruction every second step on average, the next for one instruction every second of the remaining steps on average, and so on.Distillationhttps://hugocisneros.com/notes/distillation/Mon, 14 Jun 2021 11:54:00 +0200https://hugocisneros.com/notes/distillation/tags Machine learning, Neural networks, Transfer learning Distillation is used to describe the process of transferring performances from a large trained teacher neural network to a untrained student network. Instead of training the target network to score best according the task’s loss function, distillation optimizes for the target network to match the output distribution or neuron activation patterns of the teacher network. A review: (Beyer et al. 2021). Bibliography Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov.Transfer learninghttps://hugocisneros.com/notes/transfer_learning/Mon, 14 Jun 2021 11:46:00 +0200https://hugocisneros.com/notes/transfer_learning/ tags Machine learningAutomated theorem provinghttps://hugocisneros.com/notes/automated_theorem_proving/Mon, 14 Jun 2021 11:25:00 +0200https://hugocisneros.com/notes/automated_theorem_proving/ tags Mathematics Machine learning for theorem provingAttentionhttps://hugocisneros.com/notes/attention/Mon, 14 Jun 2021 10:29:00 +0200https://hugocisneros.com/notes/attention/tags Neural networks Implementation Self-attention is a weighted average of all input elements from a sequence, with a weight proportional to a similarity score between representations. The input $x \in \mathbb{R}^{L \times F}$ is projected by matrices $W_Q \in \mathbb{R}^{F \times D}$, $W_K \in \mathbb{R}^{F\times D}$ and $W_V \in \mathbb{R}^{F\times M}$ to representations $Q$ (queries), $K$ (keys) and $V$ (values). \[ Q = xW_Q\] \[ K = xW_K\] \[ V = xW_V\]Genetic algorithmshttps://hugocisneros.com/notes/genetic_algorithm/Mon, 14 Jun 2021 10:23:00 +0200https://hugocisneros.com/notes/genetic_algorithm/Genetic algorithms can be used as optimization algorithms for search problems, where usual optimization techniques such as gradient-based ones aren’t very effective. These methods are loosely based on evolution in biological life, implementing a limited form variation and selection to progress towards better fitness (measured by a specific fitness function). New candidate solutions for a problem are constructed by randomly combining and mutating parent solutions. The best candidate are kept and become parents of the next generation.Alternative learning mechanismshttps://hugocisneros.com/notes/alternative_learning_mechanisms/Mon, 14 Jun 2021 10:06:00 +0200https://hugocisneros.com/notes/alternative_learning_mechanisms/tags Machine learning Many people, including Geoffrey Hinton, have raised concerns about the back-propagation algorithm and the fact that it’s likely not a promising way to achieve Artificial Intelligence (see this Axios blog post). Alternative mechanisms for learning have been and are currently studied to try and approach the learning problem in a more effective way. Direct feedback alignment (Nøkland 2016) Hebbian learning The theory is sometimes summarized as “Cells that fire together wire together.Roam researchhttps://hugocisneros.com/notes/roam_research/Tue, 18 May 2021 15:03:00 +0200https://hugocisneros.com/notes/roam_research/ tags WritingOrg-modehttps://hugocisneros.com/notes/org_mode/Tue, 18 May 2021 15:00:00 +0200https://hugocisneros.com/notes/org_mode/tags Emacs, Writing Org is a markup language similar to Markdown. It was designed to be used in the Emacs editor, which offers special features for working with files in the Org format. Org mode can be used as an agenda, task manager, writing and publishing tool, and many other things. Extensions in the form of Emacs package offer even more features to make org-mode more powerful.Church-Turing thesishttps://hugocisneros.com/notes/church_turing_thesis/Sun, 16 May 2021 14:56:00 +0200https://hugocisneros.com/notes/church_turing_thesis/tags Computability theory A function on the natural numbers can be computed effectively if and only if it can be computed by a Turing Machine (or any equivalent computational model). Implications for Zuse’s thesis An interesting implication of the Church-Turing thesis is any Turing-complete computational model could in theory be “computing” our Universe. However, the constant overhead of running this algorithm is very different from one model to another. There must be an optimal or close to optimal computational model for simulating life processes and it seems from everyday observation that it should be inherently parallel.Notes on: More Is Different by Anderson, P. W. (1972)https://hugocisneros.com/notes/andersonmoredifferent1972/Sun, 16 May 2021 14:37:00 +0200https://hugocisneros.com/notes/andersonmoredifferent1972/tags Complexity, Philosophy source (Anderson 1972) This is a fundamental paper discussing the fundamental laws of Physics and their relations with complexity. Reductionism doesn’t imply constructionism It is generally accepted that the fundamental laws governing our Universe are relatively simple. We feel we understand many of these laws quite well. However, understanding these fundamental laws are far from enough to actually describe and reconstruct all phenomena we witness. The main fallacy in this kind of thinking is that the reductionist hypothesis does not by any means imply a “constructionist” one: The ability to reduce everything to simple fundamental laws does not imply the ability to start from those laws and reconstruct the universe.Emergence in artificial lifehttps://hugocisneros.com/notes/emergence_in_artificial_life/Sun, 16 May 2021 14:23:00 +0200https://hugocisneros.com/notes/emergence_in_artificial_life/ tags Artificial life, Emergence resources (Gershenson 2021) Bibliography Carlos Gershenson. April 30, 2021. "Emergence in Artificial Life". Arxiv:2105.03216 [physics]. http://arxiv.org/abs/2105.03216. See notesNotes on: Hopfield Networks is All You Need by Ramsauer, H., Schäfl, B., Lehner, J., Seidl, P., Widrich, M., Gruber, L., Holzleitner, M., … (2020)https://hugocisneros.com/notes/ramsauerhopfieldnetworksall2020/Wed, 05 May 2021 16:26:00 +0200https://hugocisneros.com/notes/ramsauerhopfieldnetworksall2020/ tags Hopfield Networks, Attention source (Ramsauer et al. 2020) resources Blog post Summary This quote summarizes the paper well: “In order to integrate modern Hopfield networks into deep learning architectures, we have to make them continuous”. Comments Bibliography Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, et al.. July 16, 2020. "Hopfield Networks Is All You Need". Arxiv:2008.02217 [cs, Stat]. http://arxiv.org/abs/2008.02217.Lempel-Ziv-Welch algorithmhttps://hugocisneros.com/notes/lempel_ziv_welch_algorithm/Tue, 04 May 2021 09:22:00 +0200https://hugocisneros.com/notes/lempel_ziv_welch_algorithm/tags Compression, Complexity papers (Lempel, Ziv 1976; Ziv, Lempel 1977; Ziv, Lempel 1978; Welch 1984; Storer, Szymanski 1982) Context The LZW algorithm was originally designed as a complexity (“randomness”) metric for finite sequences (Lempel, Ziv 1976). It was then extended as a compression algorithm by the same authors to LZ77 (Ziv, Lempel 1977) and LZ78 (Ziv, Lempel 1978). Those last two are the basis of many well known and widely used compression utilities such as GIF, compress (LZW (Welch 1984) ) or DEFLATE, gzip (LZSS (Storer, Szymanski 1982)), etc.Notes on: Generalization over different cellular automata rules learned by a deep feed-forward neural network by Aach, M., Goebbert, J. H., & Jitsev, J. (2021)https://hugocisneros.com/notes/aachgeneralizationdifferentcellular2021/Mon, 26 Apr 2021 20:53:00 +0200https://hugocisneros.com/notes/aachgeneralizationdifferentcellular2021/source (Aach et al. 2021) tags Cellular automata, Neural networks Summary This paper studies the generalization abilities of neural networks on tasks involving learning the dynamics of cellular automata rules from examples. Neural networks are trained to predict the next state of a CA from the three previous timesteps. Different training examples for a single rule corresponds to different initialization. The authors study three kinds of generalization: Simple generalization: The network is trained on 300 different CA rules and tested on more unseen initial configurations from those 300 rules.Generalization in Machine learninghttps://hugocisneros.com/notes/generalization_in_machine_learning/Fri, 23 Apr 2021 11:26:00 +0200https://hugocisneros.com/notes/generalization_in_machine_learning/ tags Machine learning, Applied mathsNotes on: AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence by Clune, J. (2019)https://hugocisneros.com/notes/cluneaigasaigeneratingalgorithms2019/Fri, 23 Apr 2021 11:13:00 +0200https://hugocisneros.com/notes/cluneaigasaigeneratingalgorithms2019/tags Artificial Intelligence, Genetic algorithms, Open-ended Evolution source (Clune 2019) Summary Nowadays, the design of AI systems is approached through one main way (more or less): the implementation of some elementary building blocks — like convolutions, skip connections, activation functions, attention, etc. We currently have no clear idea how to combine these relatively successful blocks into a global system that would take advantage of each and every one of them.Matrix factorizationhttps://hugocisneros.com/notes/matrix_factorization/Wed, 21 Apr 2021 15:34:00 +0200https://hugocisneros.com/notes/matrix_factorization/tags Mathematics LU Factorization resources Nick Higham’s blog An LU factorization of a $n \times n$ matrix $A$ is a factorization $A = LU$, where $L$ is lower triangular and $U$ is upper triangular LUP factorization LU factorization with partial pivoting: $PA = LU$ with $P$ a permutation matrix. LDU factorization Lower-diagonal upper factorization: $A = LDU$ with $D$ a diagonal matrix and $L$ and $U$ are uni-triangular (triangular with diagonal one).Attractor networkshttps://hugocisneros.com/notes/attractor_networks/Mon, 19 Apr 2021 11:24:00 +0200https://hugocisneros.com/notes/attractor_networks/tags Physics, Applied maths, Neural networks resources Scholarpedia Attractor networks are sets of nodes connected in such a way that their dynamics are stable in a small subspace of their phase space. The network state usually resides on this smaller manifold after a few evolution steps. These networks are often recurrent.Boolean networkshttps://hugocisneros.com/notes/boolean_networks/Fri, 16 Apr 2021 14:46:00 +0200https://hugocisneros.com/notes/boolean_networks/tags Complex Systems A generalization of Cellular automata Boolean networks could be seen as CA generalization with any topology (not necessarily 1D or 2D). In the standard model, each node of the network is assigned a rule randomly chosen from the $2^{2^k}$ possible ones with K inputs. Like for cellular automata, cells (or nodes) don’t have to be in just two states (although the name Boolean no longer holds) and updates can be done either synchronously or asynchronously.Avidahttps://hugocisneros.com/notes/avida/Mon, 12 Apr 2021 14:57:00 +0200https://hugocisneros.com/notes/avida/tags Artificial life, Evolution Avida is an Artificial life system inspired by Tierra which uses computer programs as individuals. One interesting advantage of this system is the possibility to measure the complexity of organisms easily. This is done by counting the number of instructions in their computer program.Hash functionshttps://hugocisneros.com/notes/hash_functions/Mon, 12 Apr 2021 14:57:00 +0200https://hugocisneros.com/notes/hash_functions/tags Computer science Hash functions map variable sized inputs to a finite set of outputs. They need to have a range of properties such as: Determinism: The output of a hash function should be the same every time for each input. Universality: Two inputs should have a probability of getting the same hash as close to $1/n$ as possible, where $n$ is the size of the output set. This is the minimum number of collisions.Illegal numbershttps://hugocisneros.com/notes/illegal_numbers/Mon, 12 Apr 2021 14:29:00 +0200https://hugocisneros.com/notes/illegal_numbers/tags Cryptography, Computer science resources Wikipedia A number that represents some information which is illegal to posses or transmit, making said number technically illegal. It can also refer to numbers that have a particular meaning or connotation that a government wishes to censor. When focusing on some specific class of numbers one could create funny illegal numbers such as: Illegal primes Illegal Pythagorean triples Illegal triangular numbers Illegal Fibonacci numbers etc.The Bitter Lessonhttps://hugocisneros.com/notes/the_bitter_lesson/Thu, 08 Apr 2021 10:35:00 +0200https://hugocisneros.com/notes/the_bitter_lesson/tags Machine learning, Artificial Intelligence author Richard Sutton resources Link The Bitter Lesson is a pattern that can be observed in several areas of machine learning: many hard problems involving some form of artificial intelligence have seen dramatic progress at some point in the last 50 years, which was mostly driven by data and computations as opposed to “clever” human engineering. If this trend is a fundamental principle (which is what the article argues) it would mean that most of the time spent on engineering features and task-specific representations is wasted.Richard Suttonhttps://hugocisneros.com/notes/richard_sutton/Thu, 08 Apr 2021 09:58:00 +0200https://hugocisneros.com/notes/richard_sutton/Note-takinghttps://hugocisneros.com/notes/note_taking/Mon, 29 Mar 2021 14:49:00 +0200https://hugocisneros.com/notes/note_taking/ tags Writing Note-taking in Emacs with org-roamNotes on: Intelligence without representation by Brooks, R. A. (1991)https://hugocisneros.com/notes/brooksintelligencerepresentation1991/Mon, 29 Mar 2021 14:35:00 +0200https://hugocisneros.com/notes/brooksintelligencerepresentation1991/tags Artificial Intelligence source (Brooks 1991) Summary What is intelligence Intelligence cannot be thought of as a collection of building blocks that may one fall into place to form a coherent whole. The authors argue for another approach to build artificially intelligent systems: Build the systems incrementally, with complete systems each step of the way to ensure that the pieces and their interfaces are valid. Build intelligent systems at each step of the way that should be let loose in the real world with real sensing and action.Notes on: The geometry of integration in text classification RNNs by Aitken, K., Ramasesh, V. V., Garg, A., Cao, Y., Sussillo, D., & Maheswaranathan, N. (2020)https://hugocisneros.com/notes/aitkengeometryintegrationtext2020/Thu, 25 Mar 2021 10:20:00 +0100https://hugocisneros.com/notes/aitkengeometryintegrationtext2020/tags RNN, NLP source (Aitken et al. 2020) Summary This paper takes a dynamical system based approach to study learning in RNNs. Gradient descent optimization in RNNs allows them to learn a simplified form of memory and information processing. The authors use simple text classification tasks to try and understand if these learned properties can be understood by looking at the state dynamics of RNNs. The RNNs usually behave like attractor networks, with the hidden state lying on a low-dimensional manifold.Computer securityhttps://hugocisneros.com/notes/computer_security/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/computer_security/Some essential components of computer security: CryptographyCPPNhttps://hugocisneros.com/notes/cppn/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/cppn/ tags Neural networks, Genetic algorithms papers (Stanley 2007) resources Wikipedia Bibliography Kenneth O. Stanley. June 6, 2007. "Compositional Pattern Producing Networks: A Novel Abstraction of Development". Genetic Programming and Evolvable Machines 8 (2):131–62. DOI.Diffusion limited aggregationhttps://hugocisneros.com/notes/diffusion_limited_aggregation/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/diffusion_limited_aggregation/ tags Applied maths, PhysicsEntropyhttps://hugocisneros.com/notes/entropy/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/entropy/tags Complexity metrics references (Shannon, Weaver 1975) For a discrete random variable $X$ with outcomes $x_i$, $P(X=x_i) = P_i$, the entropy or uncertainty function of $X$ is defined as \[ H(X) = -\sum_{i=1}^{N} P_i \log P_i \] Entropy is always positive, and is maximized when the uncertainty is maximal, that is when $P_1 = P_2 = … = P_N = \frac{1}{N}$ entropy in that case is $\log N$. Interpretations:Kolmogorov complexityhttps://hugocisneros.com/notes/kolmogorov_complexity/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/kolmogorov_complexity/tags Complexity, Algorithmic Information theory, Computability theory Definition Invariance theorem For two descriptive languages $L_1$ and $L_2$ and their respective associated Kolmogorov complexity functions $K_1$ and $K_2$, there exist a constant $c$ — dependant only on $L_1, L_2$ such that \[ \forall s, -c \leq K_1(s) - K_2(s) \leq c \] In other words, there is always a bounded difference between the Kolmogorov complexity in two separate description languages.Minimum description lengthhttps://hugocisneros.com/notes/minimum_description_length/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/minimum_description_length/ tags Complexity, Algorithmic Information theory papers (Grunwald 2007; Grunwald 2004) Bibliography Peter Grunwald. 2007. The Minimum Description Length Principle. Adaptive Computation and Machine Learning. Cambridge, Mass: MIT Press. Peter Grunwald. June 4, 2004. "A Tutorial Introduction to the Minimum Description Length Principle". Arxiv:math/0406077. http://arxiv.org/abs/math/0406077.Neural tangent kernelhttps://hugocisneros.com/notes/neural_tangent_kernel/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/neural_tangent_kernel/tags Neural networks For a neural network trying to minimize a quadratic loss, the gradient flow can be re-written from \[ \dot{w} = - \nabla L (w(t)) \] to \[ \dot{w} = - \nabla y(w) (y(w) - \bar{y}) \] Therefore, the time derivative of $y$ is \[ \dot{y}(w) = \nabla y(w)^T \dot{w} = - \nabla y(w)^T \nabla y(w) (y(w) - \bar{y}) \] The NTK is the quantity to the left of the last term: $\nabla y(w)^T \nabla y(w)$.Notes on: A model of urban evolution based on innovation diffusion by Raimbault, J. (2020)https://hugocisneros.com/notes/raimbaultmodelurbanevolution2020/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/raimbaultmodelurbanevolution2020/source (Raimbault 2020) tags ALife 2020, Complex Systems, Evolution, Urban science Summary This paper studies the concept of innovation diffusion and how this could be seen as a way cities evolve. Modeling this enables finding that global integration of cities (fully connected city graph on a territory) is not optimal for efficiently diffusing innovation that can spontaneously appear in any city. I am interested in taking an ALife inspired approach to studying cities, as it can show cities as they could be.Notes on: Evolving Neural Networks through Augmenting Topologies by Stanley, K. O., & Miikkulainen, R. (2002)https://hugocisneros.com/notes/stanleyevolvingneuralnetworks2002/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/stanleyevolvingneuralnetworks2002/tags Neural networks, Genetic algorithms, NAS source (Stanley, Miikkulainen 2002) Summary This is the main paper introducing the NEAT system. This system is a direct-encoding based way of dealing with neuroevolution (evolution of ANNs). The encoding is based on a genome sequentially specifying each of the connections between modules of the network. Several tickes are used to make it possible applying GA methods to evolve networks: Historical tracking of genes to be able to align architectures and mate them.Notes on: Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems by Reinke, C., Etcheverry, M., & Oudeyer, P. (2020)https://hugocisneros.com/notes/reinkeintrinsicallymotivateddiscovery2020/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/reinkeintrinsicallymotivateddiscovery2020/source (Reinke et al. 2020) Summary The authors address the problem of automated discovery of diverse self-organized patterns in high-dimensional and complex game-of-life types of dynamical systems. They conduct experiments on Lenia. Their goal is to use an IMGEP algorithm to represent interesting patterns and discover them. Problem setting Goal: With a budget of $N$ experiments, maximize diversity of observations. Parameter space $\Theta$ of available parameters $\theta$. An observation space $O$ of observations.Notes on: Network Deconvolution by Ye, C., Evanusa, M., He, H., Mitrokhin, A., Goldstein, T., Yorke, J. A., Fermuller, Cornelia, … (2020)https://hugocisneros.com/notes/yenetworkdeconvolution2020/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/yenetworkdeconvolution2020/tags Convolutional neural networks, Neural network training source (Ye et al. 2020) Summary This paper introduces so-called Network Deconvolution, advertised as a way to remove pixel-wise and channel-wise correlation in deep neural networks. The authors base their new operator on the optimal configuration for $L_2$ linear regression, where gradient descent converges in one single step if and only if: \[ \frac{1}{N}X^t X = I \] where $X$ is the feature matrix and $N$ the number of samples.Notes on: Neuroevolution: from architectures to learning by Floreano, D., Dürr, P., & Mattiussi, C. (2008)https://hugocisneros.com/notes/floreanoneuroevolutionarchitectureslearning2008/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/floreanoneuroevolutionarchitectureslearning2008/ tags NAS source (Floreano et al. 2008) Summary Comments Bibliography Dario Floreano, Peter Dürr, Claudio Mattiussi. March 1, 2008. "Neuroevolution: From Architectures to Learning". Evolutionary Intelligence 1 (1):47–62. DOI.Notes on: On the expressive power of programming languages by Felleisen, M. (1991)https://hugocisneros.com/notes/felleisenexpressivepowerprogramming1991/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/felleisenexpressivepowerprogramming1991/source (Felleisen 1991) tags Programming languages resources PWL Conf talk Summary Programming languages have different levels of expressiveness. While can be used to create for loops, binary if statements can implement multi-if statements, etc. Turing tarpit: once we get to programming languages that are universal, everything can be re-written into anything and the notion of “expressiveness” of programming languages doesn’t make much sense. For a language $L$ and $F + L$ the addition of some features, can we say the second is more expressive than the first?PCAhttps://hugocisneros.com/notes/pca/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/pca/ tags Data representationQuality diversityhttps://hugocisneros.com/notes/quality_diversity/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/quality_diversity/tags Evolution, Reinforcement learning, Search algorithms papers (Pugh et al. 2016; Cully, Demiris 2017) QD is about creating algorithms that favor diversity in searching the space. In QD, one needs to both: Measure the quality of a solution Have a way to describe the effect of a solution Solutions in QD have to be good in the two above ways. QD is also a form of novelty search. Bibliography Justin K.Self-supervised learninghttps://hugocisneros.com/notes/self_supervised_learning/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/self_supervised_learning/tags Machine learning Definition Self supervised learning (SSL) is a learning paradigm based on the idea of using information contained within the training data to build better representations of it. Self-supervised models are usually trained to predict hidden parts of the input data from its visible parts. SSL in NLP Self-supervised learning has been used for a long time in NLP. In Language modeling, one tries to predict words from previous ones.Turing degreehttps://hugocisneros.com/notes/turing_degree/Thu, 25 Mar 2021 09:58:00 +0100https://hugocisneros.com/notes/turing_degree/tags Computability theory The idea behind Turing degrees is similar to the notion of cardinality of infinite sets ($ℵ_0, ℵ_1, …$) in the world of computation. A Turing degree is an equivalence class for the Turing equivalence. Being Turing equivalent for two sets $X$ and $Y$ means that a Turing machine can decide if an element belongs to the set $X$ when it has an oracle for membership to $Y$ (there is a way to formulate the membership problem for $X$ as a problem for $Y$) and reciprocally.Assembly theoryhttps://hugocisneros.com/notes/assembly_theory/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/assembly_theory/tags Complexity metrics papers (Marshall et al. 2019) This complexity metric is based on ideas similar to Logical depth, where instead of just looking at the general process that led to the creation of an object, we also look at the number of elementary steps in that process. Bibliography Stuart M Marshall, Douglas G Moore, Alastair R G Murray, Sara I Walker. July 2019. "Quantifying the Pathways to Life Using Assembly Spaces"CMA-EShttps://hugocisneros.com/notes/cma_es/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/cma_es/ tags Evolutionary algorithmsNotes on: A Computer Scientist's View of Life, the Universe, and Everything by Schmidhuber, J. (1999)https://hugocisneros.com/notes/schmidhubercomputerscientistview1999/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/schmidhubercomputerscientistview1999/ tags Zuse’s thesis source (Schmidhuber 1999) Summary Comments Bibliography Juergen Schmidhuber. April 13, 1999. "A Computer Scientist's View of Life, the Universe, and Everything". Arxiv:quant-ph/9904050. http://arxiv.org/abs/quant-ph/9904050.Notes on: A new structurally dissolvable self-reproducing loop evolving in a simple cellular automata space by Sayama, H. (1999)https://hugocisneros.com/notes/sayamanewstructurallydissolvable1999/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/sayamanewstructurallydissolvable1999/source (Sayama 1999) tags Cellular automata, Evolution Summary This work presents a simple evolutionary system based on Langton’s self-reproducing loop. This is entirely done with a normal state-transition rule based CA. The initial structure of the loop was modified to catch variations. An interesting consequence of this system evolving is its natural tendency to evolve towards smaller loops despite no stochastic mutation being hard-coded. Bibliography H. Sayama. 1999. "A New Structurally Dissolvable Self-reproducing Loop Evolving in a Simple Cellular Automata Space"Notes on: Adapting to Unseen Environments through Explicit Representation of Context by Tutum, C., & Miikkulainen, R. (2020)https://hugocisneros.com/notes/tutumadaptingunseenenvironments2020/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/tutumadaptingunseenenvironments2020/source (Tutum, Miikkulainen 2020) tags Meta-learning, Reinforcement learning, ALife 2020 Summary This work introduces the idea of Context-Skill networks for continuous RL tasks. Experiments are done on a Flappy bird like game. The authors use a LSTM as a context network to make part of the prediction and a feed-forward neural network as a skill network. They are able to demonstrate that in that game, better performances are achieved by using both networks compared to a single one.Notes on: An Integrated Perspective on the Constitutive and Interactive Dimensions of Autonomy by Beer, R. D. (2020)https://hugocisneros.com/notes/beerintegratedperspectiveconstitutive2020/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/beerintegratedperspectiveconstitutive2020/tags Emergence, Life, Cellular automata source (Beer 2020) Summary Constitution: “How emergent individuals are put together and maintained” Interaction: “How emergent individuals as a whole engage with the environment” Use Conway’s Game of Life as a toy model where each cell and update rule is like the Physics of the universe and this physics gives rise to a simple chemistry which can in turn support self-sustaining networks of reactions and some form of biology.Notes on: Cellular automata as convolutional neural networks by Gilpin, W. (2018)https://hugocisneros.com/notes/gilpincellularautomataconvolutional2018/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/gilpincellularautomataconvolutional2018/tags Cellular automata as CNNs source (Gilpin 2018) Summary This is one of the only attempt to represent a CA rule as a CNN I have come across. The author uses a deep CNN to learn a rule and studies various information-theoretic quantities in the activation patterns to evaluate the complexity of the rules. Comments I am personally very interested by the paper since it is an interesting direction for creating neural-network based rules that can be sampled and efficiently stored and applied.Notes on: Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data by Bender, E. M., & Koller, A. (2020)https://hugocisneros.com/notes/benderclimbingnlumeaning2020/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/benderclimbingnlumeaning2020/source (Bender, Koller 2020) tags NLP, Artificial Intelligence, Evaluating NLP Summary The main point of the article could be summarized like so: We argue that the language modeling task, because it only uses form as training data, cannot in principle lead to learning of meaning. We take the term language model to refer to any system trained only on the task of string prediction, whether it operates over characters, words or sentences, and sequentially or not.Notes on: Combinatory Chemistry: Towards a Simple Model of Emergent Evolution by Kruszewski, G., & Mikolov, T. (2020)https://hugocisneros.com/notes/kruszewskicombinatorychemistrysimple2020/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/kruszewskicombinatorychemistrysimple2020/tags Artificial life, Combinatory logic source (Kruszewski, Mikolov 2020) Summary This is Kruszewski’s approach to artificial life, based on artificial chemistry. Combinatory logic is used as a basis for this system. Conservation laws are added on top of the set of rules that make combinatory logic Turing complete. This is then used to observe interesting dynamics and pre-life-like processes. Comments Bibliography Germán Kruszewski, Tomas Mikolov. March 17, 2020. "Combinatory Chemistry: Towards a Simple Model of Emergent Evolution"Notes on: Complexity and evolution: What everybody knows by McShea, D. W. (1991)https://hugocisneros.com/notes/mcsheacomplexityevolutionwhat1991/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/mcsheacomplexityevolutionwhat1991/ tags Evolution, Complexity source (McShea 1991) Summary Comments Bibliography Daniel W. McShea. July 1991. "Complexity and Evolution: What Everybody Knows". Biology & Philosophy 6 (3):303–24. DOI.Notes on: Curiosity-Driven Exploration by Self-Supervised Prediction by Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. (2017)https://hugocisneros.com/notes/pathakcuriositydrivenexplorationselfsupervised2017/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/pathakcuriositydrivenexplorationselfsupervised2017/tags Reinforcement learning source (Pathak et al. 2017) Summary This paper presents a curiosity-based method for training RL agents. These agents are given a reward $r_t$ which is the sum of an intrinsic and an extrinsic rewards. The latter is mostly (if not always) 0, while the former is constructed progressively during exploration by an Intrisic Curiosity Module (ICM). The module is illustrated below (figure from the paper). The left part of the figure represents a standard RL setup where actions are taken according to a policy and they affect the state of the agent.Notes on: Developmental mappings and phenotypic complexity by Lehre, P. K., & Haddow, P. C. (2003)https://hugocisneros.com/notes/lehredevelopmentalmappingsphenotypic2003/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/lehredevelopmentalmappingsphenotypic2003/tags Cellular automata source (Lehre, Haddow 2003) Summary The approach of the paper is to use a genotype/phenotype distance correlation plot to study the complexity of a system that is determined by a genotype and exhibits som ephenotypic behavior. This is equivalent to simply plotting the distance of two phenotypes (Hamming of the state after 100 iteration starting from a single activated cell for CAs) against the distance between two genotypes (Hamming distance between the rules for a CA).Notes on: Diversity preservation in minimal criterion coevolution through resource limitation by Brant, J. C., & Stanley, K. O. (2020)https://hugocisneros.com/notes/brantdiversitypreservationminimal2020/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/brantdiversitypreservationminimal2020/ source (Brant, Stanley 2020) tags Co-evolution, Evolutionary algorithms Summary Comments Bibliography Jonathan C. Brant, Kenneth O. Stanley. June 25, 2020. "Diversity Preservation in Minimal Criterion Coevolution Through Resource Limitation". In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, 58–66. Cancún Mexico: ACM. DOI.Notes on: Drinking from a Firehose: Continual Learning with Web-scale Natural Language by Hu, H., Sener, O., Sha, F., & Koltun, V. (2020)https://hugocisneros.com/notes/hudrinkingfirehosecontinual2020/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/hudrinkingfirehosecontinual2020/tags Continual learning source (Hu et al. 2020) Summary This paper focuses on the problem of (self-)supervised continual learning with deep neural networks. The Firehose dataset introduced by the authors is a large database of timestamped tweets. The goal is to learn a language model for each user from the dataset, which is called Personalized online language learning (POLL). The authors also introduce a new extension of gradient descent for continual learning.Notes on: Evolved Open-Endedness, Not Open-Ended Evolution by Pattee, H. H., & Sayama, H. (2019)https://hugocisneros.com/notes/patteeevolvedopenendednessnot2019/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/patteeevolvedopenendednessnot2019/source (Pattee, Sayama 2019) Summary Evolution need not have been inherently open-ended in nature, because from a simple cell evolving in a complex self-organising environment new mechanisms might have been created by the organisms themself, effectively rendering them “more” open-ended. Symbolic languages are a striking example of this phenomenon: an open-ended descriptive power where the complexity of the environment is not limiting because language can refer itself recursively to build on its complexity.Notes on: Information-Theoretic Probing with Minimum Description Length by Voita, E., & Titov, I. (2020)https://hugocisneros.com/notes/voitainformationtheoreticprobingminimum2020/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/voitainformationtheoreticprobingminimum2020/ tags Evaluating NLP, Transformers, Minimum description length source (Voita, Titov 2020) Summary Comments Bibliography Elena Voita, Ivan Titov. March 27, 2020. "Information-theoretic Probing with Minimum Description Length". Arxiv:2003.12298 [cs]. http://arxiv.org/abs/2003.12298.Notes on: Learning Transferable Architectures for Scalable Image Recognition by Zoph, B., Vasudevan, V., Shlens, J., & Le, Q. V. (2018)https://hugocisneros.com/notes/zophlearningtransferablearchitectures2018/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/zophlearningtransferablearchitectures2018/tags NAS source (Zoph et al. 2018) Summary This paper is more or less a follow up of (Zoph, Le 2017) where the search space get at the same time widened and more constraints are added (division between normal cell for processing and reduction cell for pooling/downsampling). Normal cells get stacked $N$ times resulting in very big architectures. NASNet is created by searching for thos cells but the actual number of cells stacked and number of filters of the penultimate layer are searched separately.Notes on: Modeling systems with internal state using evolino by Wierstra, D., Gomez, F. J., & Schmidhuber, J. (2005)https://hugocisneros.com/notes/wierstramodelingsystemsinternal2005/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/wierstramodelingsystemsinternal2005/ tags Genetic algorithms, Recurrent neural networks source (Wierstra et al. 2005) Summary Comments Bibliography Daan Wierstra, Faustino J. Gomez, Jürgen Schmidhuber. 2005. "Modeling Systems with Internal State Using Evolino". In Proceedings of the 2005 Conference on Genetic and Evolutionary Computation - GECCO '05, 1795. Washington DC, USA: ACM Press. DOI.Notes on: Molecule Attention Transformer by Maziarka, Ł., Danel, T., Mucha, S., Rataj, K., Tabor, J., & Jastrzębski, S. (2020)https://hugocisneros.com/notes/maziarkamoleculeattentiontransformer2020/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/maziarkamoleculeattentiontransformer2020/ tags Neural networks source (Maziarka et al. 2020) Summary Comments Bibliography Łukasz Maziarka, Tomasz Danel, Sławomir Mucha, Krzysztof Rataj, Jacek Tabor, Stanisław Jastrzębski. February 19, 2020. "Molecule Attention Transformer". Arxiv:2002.08264 [physics, Stat]. http://arxiv.org/abs/2002.08264.Notes on: Neural Architecture Search with Reinforcement Learning by Zoph, B., & Le, Q. V. (2017)https://hugocisneros.com/notes/zophneuralarchitecturesearch2017/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/zophneuralarchitecturesearch2017/tags NAS source (Zoph, Le 2017) Summary This paper introduces the idea of using a RNN controller system to generate the operations of a neural network. In a first setting the authors use this method to construct CNNs. The controller samples an architecture, the architecture is built and trained and the controller is rewarded with the maximum validation accuracy of the last 5 epochs cubed (??). Another experiment uses this exploration method to produce recurrent cell through a complicated model based on a tree of units, for each of which the controller samples an operation.Notes on: Neural Circuit Policies Enabling Auditable Autonomy by Lechner, M., Hasani, R., Amini, A., Henzinger, T. A., Rus, D., & Grosu, R. (2020)https://hugocisneros.com/notes/lechnerneuralcircuitpolicies2020/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/lechnerneuralcircuitpolicies2020/source (Lechner et al. 2020) tags Neural networks Summary This article introduces a type of RNN called Neural Circuit Policies (NCP). This architecture is said to be inspired from the wiring diagram of the C. elegans nematode. The main building block is a Recurrent neural network called liquid time constant (LTC) introduced in (Hasani et al. 2020). LTC Neurons These neurons are bio-inspired. For a given neuron in state x_i(t), the continuous temporal evolution is described by an ODE: \[ \dot{x}_i = - \left(\frac{1}{\tau_i} + \frac{w_{ij}}{C_{m_i}} \sigma_i(x_j) \right) x_i + \left( \frac{x_{\text{leak}_i}}{\tau_i}+ \frac{w_{ij}}{C_{m_i}} \sigma_i(x_j) E_{ij} \right) \]Notes on: On Adversarial Mixup Resynthesis by Beckham, C., Honari, S., Verma, V., Lamb, A., Ghadiri, F., Hjelm, R. D., Bengio, Y., … (2019)https://hugocisneros.com/notes/beckhamadversarialmixupresynthesis2019/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/beckhamadversarialmixupresynthesis2019/ tags Autoencoders source (Beckham et al. 2019) Summary Comments Bibliography Christopher Beckham, Sina Honari, Vikas Verma, Alex Lamb, Farnoosh Ghadiri, R. Devon Hjelm, Yoshua Bengio, Christopher Pal. October 23, 2019. "On Adversarial Mixup Resynthesis". Arxiv:1903.02709 [cs, Stat]. http://arxiv.org/abs/1903.02709.Notes on: PCGRL: Procedural Content Generation via Reinforcement Learning by Khalifa, A., Bontrager, P., Earle, S., & Togelius, J. (2020)https://hugocisneros.com/notes/khalifapcgrlproceduralcontent2020/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/khalifapcgrlproceduralcontent2020/ tags Reinforcement learning source (Khalifa et al. 2020) Summary Comments Bibliography Ahmed Khalifa, Philip Bontrager, Sam Earle, Julian Togelius. January 24, 2020. "PCGRL: Procedural Content Generation via Reinforcement Learning". Arxiv:2001.09212 [cs, Stat]. http://arxiv.org/abs/2001.09212.Notes on: POET: open-ended coevolution of environments and their optimized solutions by Wang, R., Lehman, J., Clune, J., & Stanley, K. O. (2019)https://hugocisneros.com/notes/wangpoetopenendedcoevolution2019/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/wangpoetopenendedcoevolution2019/tags Open-ended Evolution, Reinforcement learning source (Wang et al. 2019) Summary This paper is about introducing the POET architecture. The core idea behind this framework is to build a system that can make agents learn complex behavior through joint evolution of agents and the environment. The better the agent, the more complex environment we can give it. There are 3 main components to the algorithm: an evolutionary strategy (ES) for the environment itself, resembling genetic algorithm, another ES for the agents (although these agents might also be trained with RL), and a transfer mechanism whereby agents trained in a particular environment can be trained on another one.Notes on: Regenerating Soft Robots through Neural Cellular Automata by Horibe, K., Walker, K., & Risi, S. (2021)https://hugocisneros.com/notes/horiberegeneratingsoftrobots2021/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/horiberegeneratingsoftrobots2021/ tags Cellular automata, Reinforcement learning source (Horibe et al. 2021) Summary The authors explore neural cellular automata (Mordvintsev et al. 2020) as a framework for growing soft robots. Comments Bibliography Kazuya Horibe, Kathryn Walker, Sebastian Risi. February 7, 2021. "Regenerating Soft Robots Through Neural Cellular Automata". Arxiv:2102.02579 [cs, Q-bio]. http://arxiv.org/abs/2102.02579. Alexander Mordvintsev, Ettore Randazzo, Eyvind Niklasson, Michael Levin. February 11, 2020. "Growing Neural Cellular Automata". Distill 5 (2):e23. DOI. See notesNotes on: Reservoir Computing in Artificial Spin Ice by Jensen, J. H., & Tufte, G. (2020)https://hugocisneros.com/notes/jensenreservoircomputingartificial2020/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/jensenreservoircomputingartificial2020/source (Jensen, Tufte 2020) tags Reservoir computing, Complex Systems Summary This talk is about artificial spin ice. This model is based on a grid of coupled magnets that can be controlled with a magnetic field. The geometry of that grid can very greatly the kind of behavior one may observe in such systems. The authors want to use the spin ice model for reservoir computing. They measure useful quantities such as kernel quality $K$ (ability to separate inputs) and generalization capabilities $G$ (how similar inputs yield similar results).Notes on: Seeking open-ended evolution in Swarm Chemistry by Sayama, H. (2011)https://hugocisneros.com/notes/sayamaseekingopenendedevolution2011/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/sayamaseekingopenendedevolution2011/ tags Open-ended Evolution source (Sayama 2011) Summary Comments Bibliography Hiroki Sayama. April 2011. "Seeking Open-ended Evolution in Swarm Chemistry". In 2011 IEEE Symposium on Artificial Life (ALIFE), 186–93. Paris, France: IEEE. DOI.Notes on: Spontaneous fine-tuning to environment in many-species chemical reaction networks by Horowitz, J. M., & England, J. L. (2017)https://hugocisneros.com/notes/horowitzspontaneousfinetuningenvironment2017/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/horowitzspontaneousfinetuningenvironment2017/ tags Chemical reaction network, Biological life source (Horowitz, England 2017) Summary Comments Bibliography Jordan M. Horowitz, Jeremy L. England. July 18, 2017. "Spontaneous Fine-tuning to Environment in Many-species Chemical Reaction Networks". Proceedings of the National Academy of Sciences 114 (29). National Academy of Sciences:7565–70. DOI.Notes on: The Architecture of Complexity by Simon, H. A. (1962)https://hugocisneros.com/notes/simonarchitecturecomplexity1962/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/simonarchitecturecomplexity1962/tags Complexity, Complex Systems source (Simon 1962) Complex systems In such systems, the whole is more than the sum of the parts, not in an ultimate, metaphysical sense, but in the important pragmatic sense that, given the properties of the parts and the laws of their interaction, it is not a trivial matter to infer the properties of the whole. In the face of complexity, an in-principle reductionist may be at the same time a pragmatic holist.Notes on: Transition phenomena in cellular automata rule space by Li, W., Packard, N. H., & Langton, C. G. (1990)https://hugocisneros.com/notes/litransitionphenomenacellular1990/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/litransitionphenomenacellular1990/tags Cellular automata source (Li et al. 1990) Summary This foundational paper follows Langton’s work on chaos and the lambda parameter. It uses information-theoretic measures to try and understand the structure of the space of CA rules. The authors come up with a classification in 6 classes: Spatially homogeneous fixed points Spatially inhomogeneous fixed points Periodic behavior Locally chaotic behavior Chaotic behavior Complex behavior Wolfram’s class I is equivalent to class 1, class II is equivalent to class 2, 3 and 4.Recurrent neural networkshttps://hugocisneros.com/notes/recurrent_neural_networks/Thu, 25 Mar 2021 09:57:00 +0100https://hugocisneros.com/notes/recurrent_neural_networks/ tags Neural networks, Machine learningAlgorithmic Information theoryhttps://hugocisneros.com/notes/algorithmic_information_theory/Thu, 25 Mar 2021 09:56:00 +0100https://hugocisneros.com/notes/algorithmic_information_theory/Convolutional neural networkshttps://hugocisneros.com/notes/convolutional_neural_networks/Thu, 25 Mar 2021 09:56:00 +0100https://hugocisneros.com/notes/convolutional_neural_networks/ tags Neural networksGödel's theoremhttps://hugocisneros.com/notes/godel_s_theorem/Thu, 25 Mar 2021 09:56:00 +0100https://hugocisneros.com/notes/godel_s_theorem/tags Logic resources Stanford encyclopedia of Philosophy First incompleteness theorem Any consistent formal system F within which a certain amount of elementary arithmetic can be carried out is incomplete; i.e., there are statements of the language of F which can neither be proved nor disproved in F. Panu Raatikainen This theorem was followed by several closely related theorems, such as Turing’s Halting problemHalting probabilityhttps://hugocisneros.com/notes/halting_probability/Thu, 25 Mar 2021 09:56:00 +0100https://hugocisneros.com/notes/halting_probability/ tags Computability theory, Algorithmic Information theory, Halting problemOrdinary least squareshttps://hugocisneros.com/notes/ordinary_least_squares/Thu, 25 Mar 2021 09:56:00 +0100https://hugocisneros.com/notes/ordinary_least_squares/ tags Applied maths, OptimizationSed utilityhttps://hugocisneros.com/notes/sed_utility/Thu, 25 Mar 2021 09:56:00 +0100https://hugocisneros.com/notes/sed_utility/tags Coding In-place batch file manipulation Delete the same line in many files Let’s start by creating a simple text file with three lines. This is what it looks like: echo "Hello\nto the\nworld" > test.txt cat test.txt Hello to the world We use sed to remove lines in the file matching some regex. The -i.bak option ensures the file is modified in place. sed -i.bak '/to the/d' test.txt cat test.Turing testhttps://hugocisneros.com/notes/turing_test/Wed, 24 Mar 2021 10:40:00 +0100https://hugocisneros.com/notes/turing_test/tags Artificial intelligence test This is probably one of the most famous test for artificial intelligence. It was elaborated by Alan Turing.Network programminghttps://hugocisneros.com/notes/network_programming/Wed, 24 Mar 2021 09:42:00 +0100https://hugocisneros.com/notes/network_programming/tags Programming, Networking An incredible resource for low-level network programming: Beej’s guide to network programming.Networkinghttps://hugocisneros.com/notes/networking/Wed, 24 Mar 2021 09:41:00 +0100https://hugocisneros.com/notes/networking/ tags ProgrammingLambda calculushttps://hugocisneros.com/notes/lambda_calculus/Wed, 03 Mar 2021 14:47:00 +0100https://hugocisneros.com/notes/lambda_calculus/ tags Computer scienceVon Neumann's self-reproducing CAhttps://hugocisneros.com/notes/von_neumann_s_self_reproducing_ca/Wed, 03 Mar 2021 08:45:00 +0100https://hugocisneros.com/notes/von_neumann_s_self_reproducing_ca/ tags Cellular automata, John Von NeumannAutomatic differentiationhttps://hugocisneros.com/notes/automatic_differentiation/Wed, 03 Mar 2021 08:43:00 +0100https://hugocisneros.com/notes/automatic_differentiation/ tags Applied maths, OptimizationWhy programming is a good medium for expressing poorly understood and sloppily-formulated ideashttps://hugocisneros.com/notes/why_programming_is_a_good_medium/Tue, 02 Mar 2021 10:17:00 +0100https://hugocisneros.com/notes/why_programming_is_a_good_medium/source Link tags Artificial Intelligence, Coding author Marvin Minsky What can computers do? The fallacy under discussion is the widespread superstition that we can’t write a computer program to do something unless one has an extremely clear, precise formulation of what is to be done, and exactly how to do it. It is generally believed that computer programs cannot be more than a set of rules and instructions for what to do in a given computer state.Article: Why Sex? Biologists Find New Explanationshttps://hugocisneros.com/notes/why_sex_biologists_find_new_explanations/Tue, 02 Mar 2021 10:15:00 +0100https://hugocisneros.com/notes/why_sex_biologists_find_new_explanations/ source Link tags Biological life, EvolutionWirth's lawhttps://hugocisneros.com/notes/wirth_s_law/Tue, 02 Mar 2021 10:15:00 +0100https://hugocisneros.com/notes/wirth_s_law/tags Computer science resources Wikipedia It is an adage which states that software is getting slower more rapidly than hardware is becoming faster.3-SAThttps://hugocisneros.com/notes/3_sat/Tue, 02 Mar 2021 10:01:00 +0100https://hugocisneros.com/notes/3_sat/ tags LogicGraphshttps://hugocisneros.com/notes/graphs/Wed, 24 Feb 2021 11:33:00 +0100https://hugocisneros.com/notes/graphs/ tags Mathematics, Computer scienceMessage-passing graph networkshttps://hugocisneros.com/notes/message_passing_graph_networks/Wed, 24 Feb 2021 11:33:00 +0100https://hugocisneros.com/notes/message_passing_graph_networks/ tags Graph neural networksAttention graph networkshttps://hugocisneros.com/notes/attention_graph_networks/Wed, 24 Feb 2021 11:32:00 +0100https://hugocisneros.com/notes/attention_graph_networks/ tags Graph neural networks, AttentionCompilationhttps://hugocisneros.com/notes/compilation/Mon, 22 Feb 2021 13:44:00 +0100https://hugocisneros.com/notes/compilation/ tags Computer science Compilation is the act of converting code from one programming language to another. Some compiled languages C Programming language Rust C++Regular expressionshttps://hugocisneros.com/notes/regular_expressions/Tue, 16 Feb 2021 21:10:00 +0100https://hugocisneros.com/notes/regular_expressions/ tags Coding, NLPNovelty searchhttps://hugocisneros.com/notes/novelty_search/Thu, 11 Feb 2021 09:34:00 +0100https://hugocisneros.com/notes/novelty_search/ tags SearchSearch algorithmshttps://hugocisneros.com/notes/search_algorithms/Wed, 10 Feb 2021 15:16:00 +0100https://hugocisneros.com/notes/search_algorithms/ tags AlgorithmTalk: The Importance of Open-Endedness in AI and Machine Learninghttps://hugocisneros.com/notes/talk_the_importance_of_open_endedness_in_ai_and_machine_learning/Wed, 10 Feb 2021 11:03:00 +0100https://hugocisneros.com/notes/talk_the_importance_of_open_endedness_in_ai_and_machine_learning/tags Open-ended Evolution, Artificial Intelligence speaker Kenneth Stanley source Youtube Why should we care about open-endedness? There is nothing you can point to that would be worth coming back to a billions year from now to see what happened. And yet, we are inside of such a system and such a system produced us. Evolution is a seemingly open-ended process for which we only have access to a single run’s current and past results.Picbreederhttps://hugocisneros.com/notes/picbreeder/Wed, 10 Feb 2021 11:02:00 +0100https://hugocisneros.com/notes/picbreeder/ tags Search, Open-ended EvolutionNyström methodhttps://hugocisneros.com/notes/nystrom_method/Wed, 10 Feb 2021 09:36:00 +0100https://hugocisneros.com/notes/nystrom_method/ tags Applied maths This method was introduced as a way to speed-up kernel machines in (Williams, Seeger 2001). Bibliography Christopher Williams, Matthias Seeger. 2001. "Using the Nyström Method to Speed up Kernel Machines". In Advances in Neural Information Processing Systems, edited by T. Leen, T. Dietterich, and V. Tresp, 13:682–88. MIT Press. https://proceedings.neurips.cc/paper/2000/file/19de10adbaa1b2ee13f77f679fa1483a-Paper.pdf.Kernel Machinehttps://hugocisneros.com/notes/kernel_machine/Fri, 05 Feb 2021 13:48:00 +0100https://hugocisneros.com/notes/kernel_machine/ tags Kernel MethodsGradient flowhttps://hugocisneros.com/notes/gradient_flow/Wed, 09 Dec 2020 14:11:00 +0100https://hugocisneros.com/notes/gradient_flow/tags Gradient descent, Optimization The gradient flow for a model parametrized by parameters $w$ and a loss function $L$ is written: \[ \dot{w} = - \nabla L (w(t)) \]Privacy-preserving machine learninghttps://hugocisneros.com/notes/privacy_preserving_machine_learning/Wed, 02 Dec 2020 11:16:00 +0100https://hugocisneros.com/notes/privacy_preserving_machine_learning/tags Machine learning, Online privacy This is a kind of machine learning where one wants to train a model or perform inference without transmitting sensitive information. This information could leak because of data transmission to an untrusted computing server, or because the model itself reveals the structure of its training data (Ateniese et al. 2013; Song et al. 2017). Bibliography Giuseppe Ateniese, Giovanni Felici, Luigi V. Mancini, Angelo Spognardi, Antonio Villani, Domenico Vitali.Homomorphic encryptionhttps://hugocisneros.com/notes/fully_homomorphic_encryption/Wed, 02 Dec 2020 10:28:00 +0100https://hugocisneros.com/notes/fully_homomorphic_encryption/tags Cryptography resources Vitalik Buterin’s blog Principle The idea of homomorphic encryption is to encrypt data in such a way that, given a function $f$ and a message to encrypt $x$, $\text{enc}(f(x)) = f(\text{enc}(x))$. This idea is similar in spirit to Privacy-preserving machine learning, or federated learning, where one wants to obfuscate data while still being able to use it in a learning model. Here, one considers arbitrary functions.Graph convolutional networkshttps://hugocisneros.com/notes/graph_convolutional_networks/Tue, 01 Dec 2020 15:34:00 +0100https://hugocisneros.com/notes/graph_convolutional_networks/ tags Convolutional neural networks, Graph neural networksComplexity of cellular automatahttps://hugocisneros.com/notes/complexity_of_cellular_automata/Wed, 25 Nov 2020 09:20:00 +0100https://hugocisneros.com/notes/complexity_of_cellular_automata/ tags Complexity, Cellular automata Measuring complexity created by cellular automata is a vast subject. Using Entropy In (Wuensche 1999), the author uses the entropy of rule table lookup frequencies to evaluate the complexity of a CA. Bibliography Andrew Wuensche. 1999. "Classifying Cellular Automata Automatically: Finding Gliders, Filtering, and Relating Space-time Patterns, Attractor Basins, and the Z Parameter". Complexity 4 (3):47–66. DOI.Rainbow tableshttps://hugocisneros.com/notes/rainbow_tables/Wed, 25 Nov 2020 09:14:00 +0100https://hugocisneros.com/notes/rainbow_tables/ tags Cryptography resources How Rainbow tables workSelf-replicationhttps://hugocisneros.com/notes/self_replication/Thu, 12 Nov 2020 11:49:00 +0100https://hugocisneros.com/notes/self_replication/ tags Complexity, Self-organization An early example of artificial self-replication is Von Neumann’s self-reproducing CA which is a cellular automaton. Self-replication in neural networks can be done with neural network quines (Chang, Lipson 2018). Bibliography Oscar Chang, Hod Lipson. May 24, 2018. "Neural Network Quine". Arxiv:1803.05859 [cs]. http://arxiv.org/abs/1803.05859.Dirichlet energyhttps://hugocisneros.com/notes/dirichlet_energy/Tue, 10 Nov 2020 09:58:00 +0100https://hugocisneros.com/notes/dirichlet_energy/The dirichlet energy of a continuous function on $\mathbb{R}^d$ is the $L^2$ norm of its gradient. In the case of a 2D graph, such as a cellular automaton or hopfield network, this can be discretized as the $L^2$ norm of the difference along each edge.Hadamard producthttps://hugocisneros.com/notes/hadamard_product/Thu, 05 Nov 2020 10:23:00 +0100https://hugocisneros.com/notes/hadamard_product/The Hadamard product is a mathematical name for element-wise multiplication of matrices.Waveletshttps://hugocisneros.com/notes/wavelets/Wed, 04 Nov 2020 09:23:00 +0100https://hugocisneros.com/notes/wavelets/ tags Applied maths, Signal processing Wavelets are functions with specific properties that make them useful when dealing with images. They are used for lossy image compression. Types of wavelets Haar wavelets Daubechies waveletsSignal processinghttps://hugocisneros.com/notes/signal_processing/Wed, 04 Nov 2020 09:18:00 +0100https://hugocisneros.com/notes/signal_processing/ tags Applied mathsImage processinghttps://hugocisneros.com/notes/image_processing/Wed, 04 Nov 2020 09:17:00 +0100https://hugocisneros.com/notes/image_processing/tags Applied maths, Signal processing Scale an image with no interpolation Imagemagick documentation convert source.[png|gif|...] -scale 400% target.[png|gif|...] The scale option can also take integer parameters (without the percent sign) to indicate the target size. Remove metadata from an image Useful for preserving Online privacy when publishing images. Pictures taken with smartphones and other modern devices often contain large amounts of data about location, time and date and device type.Neurosciencehttps://hugocisneros.com/notes/neuroscience/Mon, 02 Nov 2020 15:58:00 +0100https://hugocisneros.com/notes/neuroscience/ tags Biological life, Artificial IntelligenceDynamical systemshttps://hugocisneros.com/notes/dynamical_systems/Tue, 27 Oct 2020 20:26:00 +0100https://hugocisneros.com/notes/dynamical_systems/ tags Applied maths, PhysicsStable marriage problemhttps://hugocisneros.com/notes/stable_marriage_problem/Thu, 22 Oct 2020 09:42:00 +0200https://hugocisneros.com/notes/stable_marriage_problem/ tags AlgorithmTuring completeness of cellular automatahttps://hugocisneros.com/notes/turing_completeness_of_cellular_automata/Tue, 20 Oct 2020 08:20:00 +0200https://hugocisneros.com/notes/turing_completeness_of_cellular_automata/ tags Cellular automata, Turing-completeness Rule 110 Elementary cellular automaton rule 110 is universal (Cook 2004). Game of Life Conway’s Game of Life has also been show to be Turing-complete. Gliders can be used to implement logic gates. A working computer in Game of Life Bibliography Matthew Cook. 2004. "Universality in Elementary Cellular Automata". Complex Systems, 40.Reversible cellular automatahttps://hugocisneros.com/notes/reversible_cellular_automata/Fri, 16 Oct 2020 14:11:00 +0200https://hugocisneros.com/notes/reversible_cellular_automata/ tags Cellular automata Second-order CA Block CARaven's progressive matriceshttps://hugocisneros.com/notes/raven_s_progressive_matrices/Fri, 16 Oct 2020 10:57:00 +0200https://hugocisneros.com/notes/raven_s_progressive_matrices/tags Artificial intelligence test It is a visual test used to estimate abstract reasoning. The patterns are often between 2x2 and 6x6 matrices of symbols. One of these symbols is usually left blank and supposed to be deduced from the others. The overall concept is very similar to the Abstraction and Reasoning Corpus but is much more tied to human vision. This makes the range of possible tasks much larger but also harder to integrate in an algorithm.Turing Machinehttps://hugocisneros.com/notes/turing_machine/Mon, 05 Oct 2020 08:07:00 +0200https://hugocisneros.com/notes/turing_machine/tags Computability theory, Computer science resources Wikipedia The machine was invented by Alan Turing in 1936. General definition A Turing Machine is usually composed of four main components: A tape divided into cells. This tape is the way the machine reads inputs, writes outputs and manipulates information (storing it, moving it, etc.). Each cell can contain any symbol of a predefined alphabet. It is also often presented as infinitely long on both sides.Alan Turinghttps://hugocisneros.com/notes/alan_turing/Mon, 05 Oct 2020 08:06:00 +0200https://hugocisneros.com/notes/alan_turing/ tags Computer science, Artificial Intelligence, CryptographyOptimizationhttps://hugocisneros.com/notes/optimization/Fri, 02 Oct 2020 16:43:00 +0200https://hugocisneros.com/notes/optimization/ tags Mathematics, Applied mathsEcho-state networkshttps://hugocisneros.com/notes/echo_state_networks/Fri, 02 Oct 2020 16:17:00 +0200https://hugocisneros.com/notes/echo_state_networks/tags Recurrent neural networks, Unsupervised learning resources Scholarpedia Principle An echo state network is usually a standard RNN with fixed random weights. The output from this RNN is used as a high dimensional feature map to be fed into a machine learning system. (Jaeger 2004; Jaeger et al. 2007; Jaeger 2012) Bibliography H. Jaeger. April 2, 2004. "Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication". Science 304 (5667):78–80.Unsupervised learninghttps://hugocisneros.com/notes/unsupervised_learning/Fri, 02 Oct 2020 09:16:00 +0200https://hugocisneros.com/notes/unsupervised_learning/ tags Machine learningPythonhttps://hugocisneros.com/notes/python/Mon, 28 Sep 2020 10:32:00 +0200https://hugocisneros.com/notes/python/tags Programming languages, Coding Code tips Categories to one-hot This is a handy technique but can be very resource intensive for large arrays. import numpy as np a = np.random.randrange(5, size=10) one_hot_a = np.eye(5)[a] Side-output for jupyter notebooks Insert the following block in a notebook cell and execute as code (From Twitter). This will put the output of each cell on the side of the code. %%html <style> #notebook-container {width: 100%; background-color: #EEE} .Self-organizationhttps://hugocisneros.com/notes/self_organization/Wed, 23 Sep 2020 16:10:00 +0200https://hugocisneros.com/notes/self_organization/Self-organization is an emergent phenomenon Open questions in self-organization From (Gershenson et al. 2020) How can self-organization be programmed? This question is fundamental. It is one thing to have systems that exhibit beautiful and surprising self-organization, but its a different thing to be able to steer it in the right direction and use it. Can the macroscopic outcomes of self-organization be predicted? What is the role of self-organization in the open problems of ALife?Optimal controlhttps://hugocisneros.com/notes/optimal_control/Wed, 23 Sep 2020 11:56:00 +0200https://hugocisneros.com/notes/optimal_control/tags Applied maths resources Book by Daniel Liberzon Optimal control problem An typical optimal control problem starts with a control system \[ \dot{x} = f(t, x, u), \quad x(t_0) = x_0 \] where $x$ is the state of the system, $t$ represents time and $u$ is the control input. The goal of an OC problem is to minimize a cost functional of the form \[ J(u) := \int_{t_0}^{t_f}L(t, x(t), u(t))dt + K(t_f, x_f).Roboticshttps://hugocisneros.com/notes/robotics/Mon, 21 Sep 2020 22:59:00 +0200https://hugocisneros.com/notes/robotics/ tags Artificial life, Artificial IntelligenceArtificial intelligence testhttps://hugocisneros.com/notes/artificial_intelligence_test/Wed, 16 Sep 2020 10:56:00 +0200https://hugocisneros.com/notes/artificial_intelligence_test/tags Artificial Intelligence The most famous example is the Turing test.Chinese room experimenthttps://hugocisneros.com/notes/chinese_room_experiment/Wed, 16 Sep 2020 10:56:00 +0200https://hugocisneros.com/notes/chinese_room_experiment/ tags Artificial intelligence testFast Marching methodhttps://hugocisneros.com/notes/fast_marching_method/Mon, 07 Sep 2020 10:30:00 +0200https://hugocisneros.com/notes/fast_marching_method/tags Applied maths, Algorithm The fast marching method can be seen as a way to improve the metric issue with Dijkstra’s algorithm (which actually computes the $\ell_1$ distance on a grid). The graph update is replaced with the Eikonal equation resolution in the FM method. This reduces the bias of using a grid and converges towards the underlying geodesic distance when the grid step size tends towards 0. The FM algorithm replaces the graph update ($D_j \leftarrow \min_{k \sim j} D_k + W_j$) with a local resolution of the Eikonal equationDijkstra's algorithmhttps://hugocisneros.com/notes/dijkstra_s_algorithm/Mon, 07 Sep 2020 10:28:00 +0200https://hugocisneros.com/notes/dijkstra_s_algorithm/ tags Applied maths, AlgorithmArticle: Open-endedness: The last grand challenge you’ve never heard ofhttps://hugocisneros.com/notes/open_endedness_the_last_grand_challenge_you_ve_never_heard_of/Mon, 24 Aug 2020 12:09:00 +0200https://hugocisneros.com/notes/open_endedness_the_last_grand_challenge_you_ve_never_heard_of/ authors Kenneth Stanley tags Artificial Intelligence, Open-ended Evolution source LinkKenneth Stanleyhttps://hugocisneros.com/notes/kenneth_stanley/Mon, 24 Aug 2020 12:04:00 +0200https://hugocisneros.com/notes/kenneth_stanley/Ken Stanley is a researcher at OpenAI.Crosshatch automatahttps://hugocisneros.com/notes/crosshatch_automata/Mon, 24 Aug 2020 10:33:00 +0200https://hugocisneros.com/notes/crosshatch_automata/ resources Medium article tags Cellular automataArticle: The Cartoon Picture of Magnets That Has Transformed Sciencehttps://hugocisneros.com/notes/the_cartoon_picture_of_magnets_that_has_transformed_science/Mon, 24 Aug 2020 09:18:00 +0200https://hugocisneros.com/notes/the_cartoon_picture_of_magnets_that_has_transformed_science/tags Ising model, Complex Systems source Quanta magazine The Ising model is an example of very simply defined model that makes complex behavior emerge. Originally introduced by Wilhelm Lenz and his graduate student Ernst Ising, its purpose was to understand why magnets lose their attractive power when heated past a certain temperature. The model was first tried in 1D, where it fails to show that a magnet stays magnetized, and therefore abandoned.Article: The End of the RNA World Is Near, Biochemists Arguehttps://hugocisneros.com/notes/the_end_of_the_rna_world_is_near_biochemists_argue/Mon, 24 Aug 2020 09:17:00 +0200https://hugocisneros.com/notes/the_end_of_the_rna_world_is_near_biochemists_argue/source https://www.quantamagazine.org/the-end-of-the-rna-world-is-near-biochemists-argue-20171219/ tags Biological life This article is about alternatives to the dominant RNA-world theories. Objections to RNA: Crucial processes that we consider part of life could not have been carried out by a single polymer, and particularly not RNA. This is because these chemical reactions have rates ranging across 20 orders of magnitude. RNA cannot explain the emergence of genetic code. It would have been too long for RNA alone to find the mapping rules from 64 three nucleotide sequences to 20 amino acids.Article: What Is an Individual? Biology Seeks Clues in Information Theory.https://hugocisneros.com/notes/what_is_an_individual_biology_seeks_clues_in_information_theory/Mon, 24 Aug 2020 09:17:00 +0200https://hugocisneros.com/notes/what_is_an_individual_biology_seeks_clues_in_information_theory/tags Life resources (Krakauer et al. 2020) source Quanta Magazine “In a way, [biology] is a science of individuality,” said Melanie Mitchell, a computer scientist at the Santa Fe Institute. And yet, the notion of what it means to be an individual often gets glossed over. “So far we have a concept of ‘individual’ that’s very much like the concept of ‘pile,’” said Maxwell Ramstead, a postdoctoral researcher at McGill University.Melanie Mitchellhttps://hugocisneros.com/notes/melanie_mitchell/Mon, 24 Aug 2020 08:59:00 +0200https://hugocisneros.com/notes/melanie_mitchell/resources Website She has worked at the Santa Fe Institute and studies Complex Systems, Artificial Intelligence.Statistical complexityhttps://hugocisneros.com/notes/statistical_complexity/Wed, 29 Jul 2020 14:24:00 +0200https://hugocisneros.com/notes/statistical_complexity/tags Complexity metrics papers (Crutchfield, Young 1989) One interpretation of the statistical complexity is that it is the minimum amount of historical information required to make optimal forecasts of bits in $x$ at the error rate $h_\mu$. For periodic sequences, $C_\mu(x) = 0$ and for ideal random sequences $C_\mu(x) = 0$ too. Several researchers have tried to capture the properties of statistical complexity with practical alternatives. The resulting complexity metrics include:Talk: Alife 2020 keynote Lee Cronin - A Top Down Chemically Embodied Artificial Life Computationhttps://hugocisneros.com/notes/talk_alife_2020_keynote_lee_cronin_a_top_down_chemically_embodied_artificial_life_computation/Wed, 29 Jul 2020 14:24:00 +0200https://hugocisneros.com/notes/talk_alife_2020_keynote_lee_cronin_a_top_down_chemically_embodied_artificial_life_computation/tags Life, ALife 2020 Complex molecules are bio-signatures, they are the sign of complex (evolutionary?) processes that have been going on. Assembly theory Exploring complexity: Lee is showing some theoretical idea about a complexity metric. Like many other metrics, he starts from the observation that neither entropy nor Kolmogorov complexity are suitable for considering the history of an object. Instead of thinking in terms of disorder or complexity, why not ask simply about “how has this object been assembled?Urban sciencehttps://hugocisneros.com/notes/urban_science/Wed, 29 Jul 2020 10:10:00 +0200https://hugocisneros.com/notes/urban_science/Fractional calculushttps://hugocisneros.com/notes/fractional_calculus/Tue, 28 Jul 2020 17:29:00 +0200https://hugocisneros.com/notes/fractional_calculus/ tags Mathematics resources WikipediaProgram synthesishttps://hugocisneros.com/notes/program_synthesis/Mon, 27 Jul 2020 15:28:00 +0200https://hugocisneros.com/notes/program_synthesis/tags Computer science, Coding Program synthesis is the task of writing programs automatically for a given tasks. This is widely considered a very hard problem in the general case, as the computational languages we manipulate as human are hard to manipulate “smoothly”. Compilation is a type of program synthesis where both the source language and the target language are well defined. A compiler is written for a given source language/target language pair and is therefore doing well specified program synthesis.Epistasishttps://hugocisneros.com/notes/epistasis/Mon, 27 Jul 2020 15:14:00 +0200https://hugocisneros.com/notes/epistasis/Epistasis is about interactions between mutations in an evolving systems. No epistasis corresponds to mutation effects “stacking” without any particular kind of interaction. Positive epistasis happens when the combined effect of the two mutations is more positive than the sum of their contributions. Negative epistasis is the same principle with negative effects. Cancer is an example negative epistasis where the addition of a lot of mutations is needed to obtain a cancerous cell.Noisehttps://hugocisneros.com/notes/noise/Mon, 27 Jul 2020 14:03:00 +0200https://hugocisneros.com/notes/noise/ tags Statistics, Applied mathsMAP-Eliteshttps://hugocisneros.com/notes/map_elites/Mon, 27 Jul 2020 13:24:00 +0200https://hugocisneros.com/notes/map_elites/tags Quality diversity, Reinforcement learning papers (Mouret, Clune 2015; Cully et al. 2015) MAP-Elites are an example of QD algorithm. The behavior space is discretized in cells and during exploration, only the best “elite” for each cell is kept. Individuals are added to the grid if they: fill an empty space are better than an existing elite Bibliography Jean-Baptiste Mouret, Jeff Clune. April 19, 2015. "Illuminating Search Spaces by Mapping Elites"NK modelhttps://hugocisneros.com/notes/nk_model/Sun, 26 Jul 2020 19:57:00 +0200https://hugocisneros.com/notes/nk_model/ tags Complex SystemsCo-evolutionhttps://hugocisneros.com/notes/co_evolution/Sun, 26 Jul 2020 19:21:00 +0200https://hugocisneros.com/notes/co_evolution/ tags EvolutionAssembly languagehttps://hugocisneros.com/notes/assembly_language/Sun, 26 Jul 2020 19:09:00 +0200https://hugocisneros.com/notes/assembly_language/ tags Programming languagesFederated learninghttps://hugocisneros.com/notes/federated_learning/Sun, 26 Jul 2020 17:37:00 +0200https://hugocisneros.com/notes/federated_learning/ tags Machine learningLangton's loophttps://hugocisneros.com/notes/langton_s_loop/Sat, 25 Jul 2020 22:35:00 +0200https://hugocisneros.com/notes/langton_s_loop/ tags Christopher Langton, Cellular automataProgramming languageshttps://hugocisneros.com/notes/programming_languages/Wed, 22 Jul 2020 15:10:00 +0200https://hugocisneros.com/notes/programming_languages/ tags Computer science PL I use or have used: Python C Programming language C++ Javascript Rust Scala Java Ruby ELisp HaskellFunctional programminghttps://hugocisneros.com/notes/functional_programming/Wed, 22 Jul 2020 10:15:00 +0200https://hugocisneros.com/notes/functional_programming/ tags Computer science, Coding Example of functional programming languages Lisp HaskellHaskellhttps://hugocisneros.com/notes/haskell/Wed, 22 Jul 2020 10:14:00 +0200https://hugocisneros.com/notes/haskell/ tags Programming languages, CodingAdaptive Computation Timehttps://hugocisneros.com/notes/adaptive_computation_time/Tue, 21 Jul 2020 08:54:00 +0200https://hugocisneros.com/notes/adaptive_computation_time/tags Neural networks, Algorithm Adaptive computation time (ACT) was introduced in (Graves 2017) as a way to make computations in RNN adaptive. The network learns how many computational steps to use before emitting an output. This is done by outputting an extra halting probability at each update step, and considering two timelines: the input timeline which plays the role of an outer loop, at each of those step, a new input symbol is fed to the RNN.Autopoiesishttps://hugocisneros.com/notes/autopoiesis/Mon, 20 Jul 2020 21:35:00 +0200https://hugocisneros.com/notes/autopoiesis/ tags LifeTierrahttps://hugocisneros.com/notes/tierra/Mon, 20 Jul 2020 13:48:00 +0200https://hugocisneros.com/notes/tierra/ tags Artificial life, EvolutionSanta Fe Institutehttps://hugocisneros.com/notes/santa_fe_institute/Sun, 19 Jul 2020 22:31:00 +0200https://hugocisneros.com/notes/santa_fe_institute/ tags Complex Systems, PhysicsSimpson's paradoxhttps://hugocisneros.com/notes/simpson_s_paradox/Sun, 19 Jul 2020 22:10:00 +0200https://hugocisneros.com/notes/simpson_s_paradox/ tags StatisticsThe Simulated reality hypothesishttps://hugocisneros.com/notes/the_simulated_reality_hypothesis/Sun, 19 Jul 2020 21:27:00 +0200https://hugocisneros.com/notes/the_simulated_reality_hypothesis/tags Philosophy The simulation argument Nick Bostrom proposed a trilemma in 2003: “The fraction of human-level civilizations that reach a posthuman stage (that is, one capable of running high-fidelity ancestor simulations) is very close to zero”, or “The fraction of posthuman civilizations that are interested in running simulations of their evolutionary history, or variations thereof, is very close to zero”, or “The fraction of all people with our kind of experiences that are living in a simulation is very close to one.Nick Bostromhttps://hugocisneros.com/notes/nick_bostrom/Sun, 19 Jul 2020 21:25:00 +0200https://hugocisneros.com/notes/nick_bostrom/ tags PhilosophyBongard problemshttps://hugocisneros.com/notes/bongard_problems/Fri, 17 Jul 2020 13:46:00 +0200https://hugocisneros.com/notes/bongard_problems/ tags Artificial intelligence testEvaluating NLPhttps://hugocisneros.com/notes/evaluating_nlp/Fri, 17 Jul 2020 13:46:00 +0200https://hugocisneros.com/notes/evaluating_nlp/tags Natural language processing Language model evaluation Perplexity For a given word sequence $\mathbf{w} = (w_1, …, w_n)$, perplexity (PPL) is defined \[ PPL = 2^{-\frac{1}{n} \sum_{i=1}^n \log_2 P(w_i | w_{i-1} … w_1 )} \] It can be seen as the cross-entropy between an empirical distribution of test words and the predicted conditional word distribution. A language model that would encode each word with an average 8 bits has a perplexity of 256 ($2^8$).Abstraction and Reasoning Corpushttps://hugocisneros.com/notes/abstraction_and_reasoning_corpus/Fri, 17 Jul 2020 13:44:00 +0200https://hugocisneros.com/notes/abstraction_and_reasoning_corpus/ tags Artificial intelligence testAlgorithmic probabilityhttps://hugocisneros.com/notes/algorithmic_probability/Tue, 14 Jul 2020 08:34:00 +0200https://hugocisneros.com/notes/algorithmic_probability/ tags Complexity, Algorithmic Information theoryAutomated discovery in complex systemshttps://hugocisneros.com/notes/automated_discovery_in_complex_systems/Tue, 14 Jul 2020 08:33:00 +0200https://hugocisneros.com/notes/automated_discovery_in_complex_systems/tags Complex Systems Evolutionary algorithms and CAs Evolutionary algorithms have been used to find Cellular automata rules with specific behavior (Mitchell et al. 1996; Sapin et al. 2003) . The objective is to optimize a fitness function (majority of cells, presence of gliders and periodic patterns, etc.). Bibliography Melanie Mitchell, Hyde Park Road, Rajarshi Das, P O Box. 1996. "Evolving Cellular Automata with Genetic Algorithms: A Review of Recent Work"Backward RNNhttps://hugocisneros.com/notes/backward_rnn/Tue, 14 Jul 2020 08:33:00 +0200https://hugocisneros.com/notes/backward_rnn/tags Recurrent neural networks Regular RNNs process input in sequence. When applied to a language modeling task, one tries to predict a word given the previous ones. For example, with the sentence The quick brown fox jumps over the lazy, a classical RNN will initialize and internal state $s_0$ and process each word in sequence, starting from The and updating its internal state with each new word in order to make a final prediction.Berry's paradoxhttps://hugocisneros.com/notes/berry_s_paradox/Tue, 14 Jul 2020 08:33:00 +0200https://hugocisneros.com/notes/berry_s_paradox/Berry’s paradox is a sentence of the form “The smallest positive integer not definable in under sixty letters” (a phrase with fifty-seven letters). An argument very similar to Berry’s paradox is used in the proof of uncomputability of Kolmogorov complexity. Resolution An interesting study and resolution of Berry’s paradoxCausal inferencehttps://hugocisneros.com/notes/causal_inference/Tue, 14 Jul 2020 08:33:00 +0200https://hugocisneros.com/notes/causal_inference/ tags StatisticsComputability theoryhttps://hugocisneros.com/notes/computability_theory/Tue, 14 Jul 2020 08:31:00 +0200https://hugocisneros.com/notes/computability_theory/Edge detectionhttps://hugocisneros.com/notes/edge_detection/Tue, 14 Jul 2020 08:30:00 +0200https://hugocisneros.com/notes/edge_detection/tags Image processing Canny edge detection Canny edge detection in the most famous edge detection algorithm, originally developed by John Canny in 1986. The algorithm has 5 steps: Smooth the image with Gaussian filtering. Intensity gradients. First derivative in the horizontal ($\mathbf{G}_x$) and vertical ($\mathbf{G}_y$) directions are computed. Gradient intensity $\mathbf{G} = \sqrt{\mathbf{G}_x^2 + \mathbf{G}_y^2}$ and direction $\mathbf{\Theta} = \text{atan2}(\mathbf{G}_y, \mathbf{G}_x)$ are then computed. Edge thinning to reduce blurring from the first two steps.Effective measure complexityhttps://hugocisneros.com/notes/effective_measure_complexity/Tue, 14 Jul 2020 08:30:00 +0200https://hugocisneros.com/notes/effective_measure_complexity/ tags Complexity metricsELisphttps://hugocisneros.com/notes/elisp/Tue, 14 Jul 2020 08:29:00 +0200https://hugocisneros.com/notes/elisp/ELisp is a dialect of the Lisp programming language.Gaussian Processeshttps://hugocisneros.com/notes/gaussian_processes/Tue, 14 Jul 2020 08:28:00 +0200https://hugocisneros.com/notes/gaussian_processes/ tags Machine learning resources K. Bailey’s blog postHalting problemhttps://hugocisneros.com/notes/halting_problem/Tue, 14 Jul 2020 08:27:00 +0200https://hugocisneros.com/notes/halting_problem/ tags Computability theoryHaskell Curryhttps://hugocisneros.com/notes/haskell_curry/Tue, 14 Jul 2020 08:27:00 +0200https://hugocisneros.com/notes/haskell_curry/Information theoryhttps://hugocisneros.com/notes/information_theory/Tue, 14 Jul 2020 08:27:00 +0200https://hugocisneros.com/notes/information_theory/Javahttps://hugocisneros.com/notes/java/Tue, 14 Jul 2020 08:27:00 +0200https://hugocisneros.com/notes/java/ tags Programming languages, CodingJavascripthttps://hugocisneros.com/notes/javascript/Tue, 14 Jul 2020 08:27:00 +0200https://hugocisneros.com/notes/javascript/ tags Programming languages, CodingJevons paradoxhttps://hugocisneros.com/notes/jevons_paradox/Tue, 14 Jul 2020 08:26:00 +0200https://hugocisneros.com/notes/jevons_paradox/tags Economics, Climate resources (York, McGee 2016; Polimeni et al. 2015), Wikipedia, Real climate economics blog posts (Jim Barrett) Definition Jevons Paradox is used to describe the situation where an increase in resource efficiency triggered by technological innovation has the counter-intuitive effect of raising the demand and increasing the overall consumption. It was first described in W. S. Jenvons’ book The Coal question in 1865. It is closely to another paradox well known in road planning (Downs–Thomson paradox) and Wirth’s law in software engineering.John Von Neumannhttps://hugocisneros.com/notes/john_von_neumann/Tue, 14 Jul 2020 08:26:00 +0200https://hugocisneros.com/notes/john_von_neumann/Kaya identityhttps://hugocisneros.com/notes/kaya_identity/Tue, 14 Jul 2020 08:24:00 +0200https://hugocisneros.com/notes/kaya_identity/tags Climate Definition It was developed by Japanese economist Yoichi Kaya. $F$ is global CO2 emissions from human sources, $P$ is global population, $G$ is GPD, $E$ is global energy consumption. \[ F = P \times \frac{G}{P} \times \frac{E}{G} \times \frac{F}{E} \] The fractional terms correspond to well studied quantities: $G/P$ is the GDP per capita $E/G$ is the energy intensity of the GDP $F/E$ is the carbon footprint of energy Interpretation This identity is simply a rewrite of $F=F$ in terms of commonly used quantities to highlight several levers one could act on to reduce CO2 emissions.Konrad Zusehttps://hugocisneros.com/notes/konrad_zuse/Tue, 14 Jul 2020 08:24:00 +0200https://hugocisneros.com/notes/konrad_zuse/resources Juergen Schmidhuber’s page In 1941, he constructed the first fully functional programmable computer, the Z3. He suggested in 1967 in his book Calculating space that the universe is running on a Cellular automaton. This is now known as Zuse’s thesis.Languagehttps://hugocisneros.com/notes/language/Tue, 14 Jul 2020 08:22:00 +0200https://hugocisneros.com/notes/language/Logichttps://hugocisneros.com/notes/logic/Tue, 14 Jul 2020 08:22:00 +0200https://hugocisneros.com/notes/logic/Logical depthhttps://hugocisneros.com/notes/logical_depth/Tue, 14 Jul 2020 08:21:00 +0200https://hugocisneros.com/notes/logical_depth/tags Complexity metrics references (Bennett 1995) Logical depth can be defined as the run time of the Turing Machine that uses the minimal representation for an input $x$, $M_{\min}(x)$ — which is also its Kolmogorov complexity . It is therefore uncomputable (because the minimal representation is uncomputable). Bibliography Charles H. Bennett. 1995. "Logical Depth and Physical Complexity". In The Universal Turing Machine A Half-century Survey, edited by Rolf Herken, 2:207–35.Mathematicshttps://hugocisneros.com/notes/mathematics/Tue, 14 Jul 2020 08:21:00 +0200https://hugocisneros.com/notes/mathematics/Mean field theory of neural networks (talk)https://hugocisneros.com/notes/mean_field_theory_of_neural_networks/Tue, 14 Jul 2020 08:21:00 +0200https://hugocisneros.com/notes/mean_field_theory_of_neural_networks/speaker Andrea Montanari tags Neural networks Two layers Neural nets to Wasserstein gradient flows Classical Supervised learning setting **Morphogenesishttps://hugocisneros.com/notes/morphogenesis/Tue, 14 Jul 2020 08:20:00 +0200https://hugocisneros.com/notes/morphogenesis/ tags Biological life, PhysicsOntogeny recapitulates phylogenyhttps://hugocisneros.com/notes/ontogeny_recapitulates_phylogeny/Mon, 13 Jul 2020 18:40:00 +0200https://hugocisneros.com/notes/ontogeny_recapitulates_phylogeny/tags Evolution, Biological life link Wikipedia This is a generalization principle in biology stating that stages of development of an organism often resemble some of its ancestors.Public key encryptionhttps://hugocisneros.com/notes/public_key_encryption/Mon, 13 Jul 2020 18:39:00 +0200https://hugocisneros.com/notes/public_key_encryption/ tags Cryptography RSA Diffie-Hellman Elliptic curve cryptographyAutoencodershttps://hugocisneros.com/notes/autoencoders/Mon, 13 Jul 2020 10:19:00 +0200https://hugocisneros.com/notes/autoencoders/tags Neural networks, Data representation Autoencoders and PCA nn The relation between Autoencoders and PCA is strong. In particular, a very small autoencoder with only linear activations seems intuitively very close to PCA decomposition. (Bourlard, Kamp 1988) gives an interesting analysis of the uselessness of the activation functions in the encoding layers of an autoencoder when there is no activations in the output layers. In that case, autoencoding is closely related to a sinigular value decomposition of the input data.Lisphttps://hugocisneros.com/notes/lisp/Fri, 10 Jul 2020 11:17:00 +0200https://hugocisneros.com/notes/lisp/tags Programming languages Lisp has been a popular set of language for Artificial Intelligence research, from the 1970s to the 1990s.Reaction-diffusionhttps://hugocisneros.com/notes/reaction_diffusion/Fri, 10 Jul 2020 10:05:00 +0200https://hugocisneros.com/notes/reaction_diffusion/ tags Physics, MorphogenesisTheory of computationhttps://hugocisneros.com/notes/theory_of_computation/Fri, 10 Jul 2020 09:15:00 +0200https://hugocisneros.com/notes/theory_of_computation/ tags Computer scienceAlgorithmhttps://hugocisneros.com/notes/algorithm/Fri, 10 Jul 2020 09:13:00 +0200https://hugocisneros.com/notes/algorithm/ tags Computer science, CodingComputer sciencehttps://hugocisneros.com/notes/computer_science/Fri, 10 Jul 2020 09:12:00 +0200https://hugocisneros.com/notes/computer_science/Rice’s theoremhttps://hugocisneros.com/notes/rice_s_theorem/Thu, 09 Jul 2020 14:31:00 +0200https://hugocisneros.com/notes/rice_s_theorem/RNA-worldhttps://hugocisneros.com/notes/rna_world/Thu, 09 Jul 2020 14:31:00 +0200https://hugocisneros.com/notes/rna_world/ tags Biological lifeSIR modelhttps://hugocisneros.com/notes/sir_model/Thu, 09 Jul 2020 14:29:00 +0200https://hugocisneros.com/notes/sir_model/tags Applied maths resources Wikipedia Simplest form The SIR model is defined for a population $N$, $S$ the number of susceptible persons, $I$ the number of infected people and $R$ the number of poeple who have recovered. The following system of differential equations governs the evolution of those three variables: \[ \frac{dS}{dt} = - \frac{\beta I S}{N} \] \[ \frac{dI}{dt} = \frac{\beta I S }{N}- \gamma I \] \[ \frac{dR}{dt} = \gamma I \]Statistical physicshttps://hugocisneros.com/notes/statistical_physics/Thu, 09 Jul 2020 14:29:00 +0200https://hugocisneros.com/notes/statistical_physics/ tags Physics, StatisticsStatisticshttps://hugocisneros.com/notes/statistics/Thu, 09 Jul 2020 14:28:00 +0200https://hugocisneros.com/notes/statistics/ tags Applied mathsC Programming languagehttps://hugocisneros.com/notes/c_programming_language/Thu, 09 Jul 2020 12:44:00 +0200https://hugocisneros.com/notes/c_programming_language/C++https://hugocisneros.com/notes/c/Thu, 09 Jul 2020 12:44:00 +0200https://hugocisneros.com/notes/c/ tags Programming languages, CodingSurprisingly Turing-Completehttps://hugocisneros.com/notes/surprisingly_turing_complete/Thu, 09 Jul 2020 12:43:00 +0200https://hugocisneros.com/notes/surprisingly_turing_complete/tags Turing-completeness source Gwern Branwen’s website Turing-completeness is common TC [Turing-completeness], […] is […] weirdly common: one might think that such universality as a system being smart enough to be able to run any program might be difficult or hard to achieve, but it turns out to be the opposite and it is difficult to write a useful system which does not immediately tip over into TC. I’ve often been amazed at how common TC can be in sufficiently complicated systems.Symmetric encryptionhttps://hugocisneros.com/notes/symmetric_encryption/Thu, 09 Jul 2020 12:43:00 +0200https://hugocisneros.com/notes/symmetric_encryption/ tags CryptographyTalk: Artificial Intelligence: A Guide for Thinking Humanshttps://hugocisneros.com/notes/talk_artificial_intelligence_a_guide_for_thinking_humans/Thu, 09 Jul 2020 12:43:00 +0200https://hugocisneros.com/notes/talk_artificial_intelligence_a_guide_for_thinking_humans/presenter Melanie Mitchell source Youtube Talk at the Santa Fe Institute on Nov 13, 2019. What is Artificial Intelligence? Many different things fall under the name AI (self-driving cars, chess playing machines, image classifier, video game AIs, etc.). [Building] machines that perform tasks normally requiring human intelligence. — Nils Nilsson, 1971 Chess was thought to be the pinnacle of intelligence until a brute-force approach was found to beat any human intelligent approach.Talk: Differentiation of black-box combinatorial solvershttps://hugocisneros.com/notes/talk_differentiation_of_black_box_combinatorial_solvers/Thu, 09 Jul 2020 12:43:00 +0200https://hugocisneros.com/notes/talk_differentiation_of_black_box_combinatorial_solvers/presenter Michal Rolinek tags Combinatorics, Machine learning The goal is to merge combinatorial optimization and deep learning. Make use of strong battle tested optimization methods. Some of those can find almost-optimal solutions to NP-hard problems in ~quadratic time. Goal is to cover many combinatorial problems, TSP multi-cut, etc. fast backward pass theoretically sound easy to use But the goal is not to take a combinatorial problem but just relax it to make it differentiable, because there is often a huge price to pay for this.The Lottery ticket hypothesishttps://hugocisneros.com/notes/the_lottery_ticket_hypothesis/Thu, 09 Jul 2020 12:42:00 +0200https://hugocisneros.com/notes/the_lottery_ticket_hypothesis/tags Neural network training resources The AI podcast papers (Frankle, Carbin 2018) When training very large neural networks, the obtained net might have a lot of unused neurons. It is possible, through neural network pruning, to remove a lot of those unused connections to make the overall architecture lighter and faster to run on some hardware. However, once you have the pruned architecture, it will often not be able to learn anything interesting when it is trained from scratch.Decentralizationhttps://hugocisneros.com/notes/decentralization/Thu, 09 Jul 2020 11:12:00 +0200https://hugocisneros.com/notes/decentralization/ tags EconomicsPhysicshttps://hugocisneros.com/notes/physics/Thu, 02 Jul 2020 10:23:00 +0200https://hugocisneros.com/notes/physics/Data representationhttps://hugocisneros.com/notes/data_representation/Thu, 02 Jul 2020 10:22:00 +0200https://hugocisneros.com/notes/data_representation/tags Machine learning Data representation is about finding compact representation of high dimensional data (such as images, videos, 3D shapes, etc.) Several methods have been developed for this purpose such as PCA, Neural networks-based representation, Autoencoders.Evolutionhttps://hugocisneros.com/notes/evolution/Thu, 02 Jul 2020 10:22:00 +0200https://hugocisneros.com/notes/evolution/ tags Artificial life, LifeDowns–Thomson paradoxhttps://hugocisneros.com/notes/downs_thomson_paradox/Thu, 02 Jul 2020 10:21:00 +0200https://hugocisneros.com/notes/downs_thomson_paradox/Economic liberalismhttps://hugocisneros.com/notes/economic_liberalism/Thu, 02 Jul 2020 10:21:00 +0200https://hugocisneros.com/notes/economic_liberalism/tags Economics Definition The weekly newspaper The Economist is often described as having economic liberalism among its political alignment. Decentralization, globalization and economic liberalism Many economic liberalism advocates consider that economic decision should follow some natural tendencies. This, to me, is related to some energy minimization principles where letting everything go normally should lead to the optimal configuration. In the case of Decentralization, an almost immediate effect is the decrease of prices of some common goods which usually make people happier.Boltzmann brainhttps://hugocisneros.com/notes/boltzmann_brain/Thu, 02 Jul 2020 10:20:00 +0200https://hugocisneros.com/notes/boltzmann_brain/tags Physics, Statistical physics Boltzmann brain thoughts Boltzmann brain is an interesting concept offered initially in response to one of Ludwig Boltzmann’s explanation for the low-entropy state of the Universe. He hypothesized that even a fully random universe would fluctuate towards lower-entropy states. The issue is that many phenomena such as evolved life on Earth are so far from equilibrium it looks like they were extremely unlikely to have happened.Combinatoricshttps://hugocisneros.com/notes/combinatorics/Thu, 02 Jul 2020 10:20:00 +0200https://hugocisneros.com/notes/combinatorics/ tags MathematicsJohn Conwayhttps://hugocisneros.com/notes/john_conway/Thu, 02 Jul 2020 10:20:00 +0200https://hugocisneros.com/notes/john_conway/ tags MathematicsMarvin Minskyhttps://hugocisneros.com/notes/marvin_minsky/Thu, 02 Jul 2020 10:20:00 +0200https://hugocisneros.com/notes/marvin_minsky/Neural network pruninghttps://hugocisneros.com/notes/neural_network_pruning/Thu, 02 Jul 2020 10:20:00 +0200https://hugocisneros.com/notes/neural_network_pruning/tags Neural networks papers (LeCun et al. 1990; Hassibi, Stork 1993; Han et al. 2015; Li et al. 2016) Bibliography Yann LeCun, John S. Denker, Sara A. Solla. 1990. "Optimal Brain Damage". In Advances in Neural Information Processing Systems, 598–605. Babak Hassibi, David G. Stork. 1993. "Second Order Derivatives for Network Pruning: Optimal Brain Surgeon". In Advances in Neural Information Processing Systems, 164–71. Song Han, Jeff Pool, John Tran, William Dally.Rubyhttps://hugocisneros.com/notes/ruby/Thu, 02 Jul 2020 10:20:00 +0200https://hugocisneros.com/notes/ruby/ tags Programming languages, CodingScalahttps://hugocisneros.com/notes/scala/Thu, 02 Jul 2020 10:20:00 +0200https://hugocisneros.com/notes/scala/ tags Programming languages, CodingAbelian sandpile modelhttps://hugocisneros.com/notes/abelian_sandpile_model/Thu, 02 Jul 2020 10:09:00 +0200https://hugocisneros.com/notes/abelian_sandpile_model/ tags Cellular automata resources WikipediaHyperbolic geometryhttps://hugocisneros.com/notes/hyperbolic_geometry/Thu, 02 Jul 2020 09:46:00 +0200https://hugocisneros.com/notes/hyperbolic_geometry/ tags MathematicsChaoshttps://hugocisneros.com/notes/chaos/Thu, 02 Jul 2020 08:45:00 +0200https://hugocisneros.com/notes/chaos/tags Physics Chaos is a striking example of emergence. Deterministic equations of motions lead to completely unpredictable over time. Randomness has emerged from these deterministic laws. From (Crutchfield 1994): Where in the determinism did the randomness come from? The answer is that the effective dynamic, which maps from initial conditions to states at a later time, becomes so complicated that an observer can neither measure the system accurately enough nor compute with sufficient power to predict the future behavior when given an initial condition.Complexityhttps://hugocisneros.com/notes/complexity/Thu, 02 Jul 2020 08:39:00 +0200https://hugocisneros.com/notes/complexity/resources Page of Pablo Funes’ PhD thesis What is complexity? What is complexity?: The question is very much too vast to be answered in something smaller than a whole book. I am planning on dedicating an entire post about measuring complexity with a range of metrics that people have come up with in the past. A big question I’m asking myself is: “How much does complexity depend on subjectivity and the observer?Makehttps://hugocisneros.com/notes/make/Wed, 01 Jul 2020 20:23:00 +0200https://hugocisneros.com/notes/make/tags Coding Make is a build automation tool. Don’t deal with tabs I have been annoyed with tabs in Makefiles many times. Some editors or copy-pasting functions automatically convert tabs to space and vice-versa and this can break your Makefile. With GNU Make 4.0 or later, it is possible to set the prefix to some other fixed token. To use > as a prefix, put this at the beginning of your Makefile:Cellular automata as regular languageshttps://hugocisneros.com/notes/cellular_automata_as_regular_languages/Wed, 01 Jul 2020 14:20:00 +0200https://hugocisneros.com/notes/cellular_automata_as_regular_languages/tags Cellular automata, Finite state machines From (Hanson, Crutchfield 1997): Finite state machines are appropriate for investigating pattern dynamics of CAs for a number of reasons, among which we may note the following: FAs encompass the full range of behavior types from periodic to complex to random; Characterization of patterns using FAs makes possible a definition of pattern complexity which is both natural and computable in practice; Ensemble evolution in the space of regular languages is closed under the CA rule; The CA update rule is itself an FST; Automated inference techniques exist for reconstructing FAs from experimental data Bibliography James E.Chemical reaction networkhttps://hugocisneros.com/notes/chemical_reaction_network/Wed, 01 Jul 2020 08:44:00 +0200https://hugocisneros.com/notes/chemical_reaction_network/ tags Complex SystemsKernel Methodshttps://hugocisneros.com/notes/kernel_methods/Wed, 01 Jul 2020 08:14:00 +0200https://hugocisneros.com/notes/kernel_methods/ tags Machine learningPhilosophyhttps://hugocisneros.com/notes/philosophy/Fri, 26 Jun 2020 11:28:00 +0200https://hugocisneros.com/notes/philosophy/