<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Notes on Hugo Cisneros</title><link>https://hugocisneros.com/notes/</link><description>Recent content in Notes on Hugo Cisneros</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><lastBuildDate>Sun, 26 Apr 2026 17:23:00 +0200</lastBuildDate><atom:link href="https://hugocisneros.com/notes/index.xml" rel="self" type="application/rss+xml"/><item><title>Inverse reinforcement learning</title><link>https://hugocisneros.com/notes/inverse_reinforcement_learning/</link><pubDate>Sun, 26 Apr 2026 17:23:00 +0200</pubDate><guid>https://hugocisneros.com/notes/inverse_reinforcement_learning/</guid><description>tags Reinforcement learning, Reinforcement learning with human feedback, Reward shaping Recovering an unknown reward function from expert demonstrations such that the demonstrated behavior is optimal under it.
It is a classical setting extended in modern post-training via RLHF, adversarial IRL, and demonstration-conditioned implicit rewards.</description></item><item><title>Notes on: Self-Distillation Enables Continual Learning by Idan Shenfeld, Mehul Damani, Jonas Hübotter, Pulkit Agrawal (2026)</title><link>https://hugocisneros.com/notes/shenfeldselfdistillationcontinual2026/</link><pubDate>Sun, 26 Apr 2026 17:23:00 +0200</pubDate><guid>https://hugocisneros.com/notes/shenfeldselfdistillationcontinual2026/</guid><description>tags Continual learning, Catastrophic forgetting, Distillation, In-context learning, Large language models source (Shenfeld et al. 2026) Summary This paper introduces Self-Distillation Fine-Tuning (SDFT), an on-policy alternative to supervised fine-tuning (SFT) for continual learning from expert demonstrations. The motivation is a known asymmetry in post-training: on-policy reinforcement learning reduces catastrophic forgetting but requires explicit reward functions, while SFT works from cheap demonstrations but is inherently off-policy and tends to overwrite prior capabilities.</description></item><item><title>Off-policy distillation</title><link>https://hugocisneros.com/notes/off_policy_distillation/</link><pubDate>Sun, 26 Apr 2026 17:21:00 +0200</pubDate><guid>https://hugocisneros.com/notes/off_policy_distillation/</guid><description>tags Distillation, Supervised fine-tuning, Large language models Distillation regime where the student is trained on a fixed dataset of trajectories generated by a different distribution. typically the teacher, an earlier checkpoint, or a static expert corpus — rather than from the student&amp;rsquo;s own current policy.
Standard knowledge distillation and most supervised fine-tuning recipes are off-policy: the training samples do not depend on the student&amp;rsquo;s evolving distribution, so the student is supervised on states it would not itself visit at inference time.</description></item><item><title>On-policy distillation</title><link>https://hugocisneros.com/notes/on_policy_distillation/</link><pubDate>Sun, 26 Apr 2026 17:18:00 +0200</pubDate><guid>https://hugocisneros.com/notes/on_policy_distillation/</guid><description>tags Distillation, Reinforcement learning, Large language models, Continual learning Distillation regime where the student samples its own trajectories and minimizes divergence to a teacher evaluated on those samples.
It combines the credit-assignment denseness of distillation with the distribution-matching guarantees of on-policy learning.
Contrast with off-policy distillation, where the student is trained on trajectories generated by a fixed teacher / dataset / earlier checkpoint rather than its own current policy. The off-policy regime is cheaper (no generation loop) but suffers from distribution mismatch.</description></item><item><title>Notes on: Reinforcement Learning via Self-Distillation by Hübotter, J., Lübeck, F., Behric, L., Baumann, A., Bagatella, M., Marta, D., Hakimi, I., Shenfeld, I., Kleine Buening, T., Guestrin, C. &amp; Krause, A. (2026)</title><link>https://hugocisneros.com/notes/hubotterreinforcementlearningselfdistillation2026/</link><pubDate>Sun, 26 Apr 2026 17:14:00 +0200</pubDate><guid>https://hugocisneros.com/notes/hubotterreinforcementlearningselfdistillation2026/</guid><description>tags Reinforcement learning, Distillation, Large language models, In-context learning source (Hübotter et al. 2026) Summary This paper introduces Self-Distillation Policy Optimization (SDPO), a new algorithm for post-training large language models with reinforcement learning. Current methods for RL with verifiable rewards (RLVR), such as GRPO, learn only from sparse scalar outcome rewards (e.g., pass/fail), creating a severe credit-assignment bottleneck. The authors formalize a more general setting called Reinforcement Learning with Rich Feedback (RLRF), where environments provide tokenized feedback (runtime errors, judge evaluations) explaining why an attempt failed.</description></item><item><title>Notes on: Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding by Christopher Clark, Jieyu Zhang, Zixian Ma, Jae Sung Park, Mohammadreza Salehi, Rohun Tripathi, Sangho Lee, Zhongzheng Ren, Chris Dongjoo Kim, Yinuo Yang, Vincent Shao, Yue Yang, Weikai Huang, Ziqi Gao, Taira Anderson, Jianrui Zhang, Jitesh Jain, George Stoica, Winson Han, Ali Farhadi, Ranjay Krishna (2026)</title><link>https://hugocisneros.com/notes/clarkmolmo2video2026/</link><pubDate>Sun, 26 Apr 2026 13:55:00 +0200</pubDate><guid>https://hugocisneros.com/notes/clarkmolmo2video2026/</guid><description>tags Vision Language Models, Grounding, Synthetic training data, Spatial Reasoning, Foundation models source (Clark et al. 2026) Summary Molmo2 is a family of fully open Vision Language Models (4B, 8B built on Qwen3, and a 7B built on OLMo) trained without distillation from proprietary systems. The work closes the open-source gap for video-capable VLMs with a core focus on Grounding — producing pixel-level pointing and tracking outputs across single images, multi-image sets, and videos — a capability that even proprietary models largely lack.</description></item><item><title>Vision Language Models</title><link>https://hugocisneros.com/notes/vision_language_models/</link><pubDate>Sun, 26 Apr 2026 13:55:00 +0200</pubDate><guid>https://hugocisneros.com/notes/vision_language_models/</guid><description>tags LLM, Vision transformer Vision language models (VLMs) are generative AI models trained on both text and images. They can be effective tools for Image classification.
CLIP is an early and influential example of contrastive vision-language pre-training. VLMs are a class of Foundation models.
Remote Sensing VLMs VLMs for remote sensing are maturing rapidly (as of early 2026), approaching the point where analysts can query satellite archives in natural language and receive spatially grounded, visually referenced answers.</description></item><item><title> Notes on: DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning by Ziwei Zheng, Michael Yang, Jack Hong, Chenxiao Zhao, Guohai Xu, Le Yang, Chao Shen, Xing Yu (2025)</title><link>https://hugocisneros.com/notes/zhengdeepeyesincentivizing2025/</link><pubDate>Sun, 26 Apr 2026 13:54:00 +0200</pubDate><guid>https://hugocisneros.com/notes/zhengdeepeyesincentivizing2025/</guid><description>tags Vision Language Models, Reinforcement learning, GRPO, Tool calling, Grounding, Reinforcement learning with verifiable rewards source (Zheng et al. 2025) Summary DeepEyes is a Vision-Language Model that learns to &amp;ldquo;think with images&amp;rdquo; — interleaving textual chain-of-thought with self-initiated image zoom-ins during reasoning — and is trained purely with end-to-end Reinforcement learning, without any cold-start supervised fine-tuning on synthetic reasoning traces. Built on Qwen2.5-VL-7B and optimized with GRPO, the model emits grounding coordinates whose cropped patches are fed back into the context as new observation tokens, forming what the authors call an interleaved Multimodal Chain-of-Thought (iMCoT).</description></item><item><title>Notes on: GeoEyes: On-Demand Visual Focusing for Evidence-Grounded Understanding of Ultra-High-Resolution Remote Sensing Imagery by Fengxiang Wang, Mingshuo Chen, Yueying Li, Yajie Yang, Yifan Zhang, Long Lan, Xue Yang, Hongda Sun, Yulin Wang, Di Wang, Jun Song, Jing Zhang, Bo Du (2026)</title><link>https://hugocisneros.com/notes/wanggeoeyesondemand2026/</link><pubDate>Sun, 26 Apr 2026 13:53:00 +0200</pubDate><guid>https://hugocisneros.com/notes/wanggeoeyesondemand2026/</guid><description>tags Vision Language Models, Geospatial AI, Tool calling, GRPO, Reinforcement learning source (Wang et al. 2026) Summary GeoEyes addresses visual question answering (VQA) on ultra-high-resolution (UHR) remote sensing imagery — scenes where task-relevant cues occupy only tiny fractions of the full image. The authors target the &amp;ldquo;thinking-with-images&amp;rdquo; paradigm, in which Vision Language Models interleave textual reasoning with active zoom_in Tool calling to acquire high-resolution evidence on demand. The paper diagnoses a systematic failure of existing zoom-enabled MLLMs (e.</description></item><item><title>Notes on: LoRA Learns Less and Forgets Less by Dan Biderman, Jacob Portes, Jose Javier Gonzalez Ortiz, Mansheej Paul, Philip Greengard, Connor Jennings, Daniel King, Sam Havens, Vitaliy Chiley, Jonathan Frankle, Cody Blakeney, John P. Cunningham (2024)</title><link>https://hugocisneros.com/notes/bidermanloralearns2024/</link><pubDate>Sun, 26 Apr 2026 13:53:00 +0200</pubDate><guid>https://hugocisneros.com/notes/bidermanloralearns2024/</guid><description>tags LLM, Continual learning, Catastrophic forgetting, Transfer learning, Language modeling source (Biderman et al. 2024) Summary This paper provides a rigorous head-to-head comparison of Low-Rank Adaptation (LoRA) against full finetuning of Llama-2-7B on two challenging target domains (code and math) under two training regimes: continued pretraining (CPT, ~20B unlabeled tokens) and instruction finetuning (IFT, ~100K prompt–response pairs). The central question is: under which conditions does LoRA approximate full finetuning accuracy, and to what extent does it mitigate catastrophic forgetting of base model capabilities?</description></item><item><title>Supervised Fine Tuning</title><link>https://hugocisneros.com/notes/supervised_fine_tuning/</link><pubDate>Sun, 26 Apr 2026 13:53:00 +0200</pubDate><guid>https://hugocisneros.com/notes/supervised_fine_tuning/</guid><description>tags Large language models, Foundation models, Catastrophic forgetting, Continual learning Off-policy adaptation of a pretrained model by training on (input, target) pairs from expert demonstrations using cross-entropy loss.
It is the dominant post-training recipe for skill and knowledge injection but prone to catastrophic forgetting due to its off-policy nature</description></item><item><title>Notes on: Residual Matrix Transformers: Scaling the Size of the Residual Stream by Brian Mak, Jeffrey Flanigan (2025)</title><link>https://hugocisneros.com/notes/makresidualmatrixtransformers2025/</link><pubDate>Sun, 26 Apr 2026 13:45:00 +0200</pubDate><guid>https://hugocisneros.com/notes/makresidualmatrixtransformers2025/</guid><description>tags Transformers, LLM, Scaling laws, Attention, Residual neural networks, Memory in neural networks source (Mak, Flanigan 2025) Summary Standard Transformers use a residual stream of dimension \(D\) as a &amp;ldquo;memory bus&amp;rdquo; where every layer reads and writes features (Elhage et al., 2021). Resizing this bus also resizes every weight matrix, so the bandwidth of the residual stream is structurally tied to parameter count and per-token FLOPs. The Residual Matrix Transformer (RMT) breaks this coupling by replacing the residual vector at each token position with an outer-product memory matrix \(M \in \mathbb{R}^{D_k \times D_v}\) (Kohonen, 1972; Anderson, 1972).</description></item><item><title>Knowledge Base Index</title><link>https://hugocisneros.com/notes/notes/</link><pubDate>Sun, 26 Apr 2026 00:00:00 +0200</pubDate><guid>https://hugocisneros.com/notes/notes/</guid><description>Notes by Topic ALife 2020 Talk: Alife 2020 keynote Lee Cronin - A Top Down Chemically Embodied Artificial Life Computation (9 links, 1 backlinks) Talk: Alife 2020 keynote Michael Levin - Robot Cancer (4 links, 1 backlinks) Talk: Alife 2020 keynote Luis Zaman - New Frontiers in Alife: What was old is new again (14 links, 1 backlinks) ALife Conference ALife 2020 (24 links, 9 backlinks) Agent Grounding (4 links, 8 backlinks) Tool calling (3 links, 6 backlinks) Geospatial AI (11 links, 4 backlinks) Coding agent (6 links, 4 backlinks) Model Context Protocol (6 links, 2 backlinks) Algorithm Gradient descent (2 links, 10 backlinks) GRPO (3 links, 6 backlinks) Search algorithms (1 links, 5 backlinks) Dijkstra&amp;rsquo;s algorithm (2 links, 2 backlinks) Backpropagation (2 links, 2 backlinks) PPO (3 links, 1 backlinks) Levenshtein distance (2 links, 1 backlinks) Frank-Wolfe algorithm (2 links, 1 backlinks) Generative art (2 links, 1 backlinks) Graham scan (1 links, 0 backlinks) Fast Marching method (3 links, 0 backlinks) Adaptive Computation Time (4 links, 0 backlinks) Stable marriage problem (1 links, 0 backlinks) Levenshtein automata (3 links, 0 backlinks) Knuth-Morris-Pratt string-searching algorithm (2 links, 0 backlinks) Pattern-defeating quicksort (1 links, 0 backlinks) Algorithmic Information theory Kolmogorov complexity (4 links, 5 backlinks) Halting probability (3 links, 1 backlinks) Minimum description length (2 links, 1 backlinks) Algorithmic probability (2 links, 0 backlinks) Applied maths Machine learning (6 links, 64 backlinks) Optimization (2 links, 17 backlinks) Dynamical systems (2 links, 13 backlinks) Cryptography (2 links, 12 backlinks) Statistics (1 links, 7 backlinks) Signal processing (1 links, 5 backlinks) Image processing (3 links, 5 backlinks) Noise (2 links, 3 backlinks) Linear Attention (4 links, 3 backlinks) Ordinary least squares (2 links, 2 backlinks) Dijkstra&amp;rsquo;s algorithm (2 links, 2 backlinks) Optimal transport (1 links, 2 backlinks) Attractor networks (4 links, 2 backlinks) Nyström method (1 links, 1 backlinks) Generalization in Machine learning (2 links, 1 backlinks) Softmax (3 links, 1 backlinks) Kullback-leibler divergence (1 links, 1 backlinks) SIR model (1 links, 1 backlinks) Fast Marching method (3 links, 0 backlinks) Optimal control (1 links, 0 backlinks) Wavelets (3 links, 0 backlinks) System of linear equations (3 links, 0 backlinks) Diffusion limited aggregation (2 links, 0 backlinks) Automatic differentiation (2 links, 0 backlinks) Art Generative art (2 links, 1 backlinks) Art with Cellular Automata (4 links, 0 backlinks) Creative coding (2 links, 0 backlinks) Artificial Intelligence Machine learning (6 links, 64 backlinks) Cellular automata (18 links, 51 backlinks) Open-ended Evolution (11 links, 19 backlinks) Artificial intelligence test (2 links, 7 backlinks) Robotics (2 links, 3 backlinks) Alan Turing (3 links, 2 backlinks) The Scaling Hypothesis (2 links, 2 backlinks) Article: Open-endedness: The last grand challenge you’ve never heard of (3 links, 1 backlinks) Neuroscience (2 links, 1 backlinks) The Bitter Lesson (4 links, 0 backlinks) AI and climate change (3 links, 0 backlinks) Motion planning (1 links, 0 backlinks) Why programming is a good medium for expressing poorly understood and sloppily-formulated ideas (5 links, 0 backlinks) Talk: The Importance of Open-Endedness in AI and Machine Learning (13 links, 0 backlinks) AI capitalism (2 links, 0 backlinks) Automation (2 links, 0 backlinks) Artificial intelligence test Abstraction and Reasoning Corpus (1 links, 1 backlinks) Chinese room experiment (1 links, 1 backlinks) Turing test (3 links, 1 backlinks) Raven&amp;rsquo;s progressive matrices (3 links, 0 backlinks) Bongard problems (1 links, 0 backlinks) Artificial life Evolution (2 links, 27 backlinks) ALife 2020 (24 links, 9 backlinks) Robotics (2 links, 3 backlinks) Christopher Langton (6 links, 2 backlinks) Tierra (2 links, 2 backlinks) Avida (4 links, 1 backlinks) ALife Conference (5 links, 1 backlinks) Talk: Alife 2020 keynote Luis Zaman - New Frontiers in Alife: What was old is new again (14 links, 1 backlinks) Emergence in artificial life (2 links, 0 backlinks) Attention Positional encoding (2 links, 4 backlinks) Linear Attention (4 links, 3 backlinks) Attention graph networks (2 links, 1 backlinks) BART mBART (3 links, 0 backlinks) DQ-BART (4 links, 0 backlinks) BERT Vision transformer (3 links, 8 backlinks) Megatron (4 links, 1 backlinks) DistillBERT (4 links, 1 backlinks) RoBERTa (3 links, 1 backlinks) ALBERT (3 links, 0 backlinks) ERNIE (3 links, 0 backlinks) Biological life Morphogenesis (2 links, 1 backlinks) RNA-world (1 links, 1 backlinks) Neuroscience (2 links, 1 backlinks) Ontogeny recapitulates phylogeny (2 links, 1 backlinks) Talk: Alife 2020 keynote Michael Levin - Robot Cancer (4 links, 1 backlinks) Article: Why Sex?</description></item><item><title>Open-vocabulary detection</title><link>https://hugocisneros.com/notes/open_vocabulary_detection/</link><pubDate>Sun, 19 Apr 2026 14:06:00 +0200</pubDate><guid>https://hugocisneros.com/notes/open_vocabulary_detection/</guid><description>tags Object recognition, Vision Language Models, Grounding, Computer vision Object detection where the class vocabulary is unbounded and specified at inference via natural-language phrases or visual exemplars. It requires vision-language alignment and calibrated rejection of hard negatives.</description></item><item><title>Image segmentation</title><link>https://hugocisneros.com/notes/image_segmentation/</link><pubDate>Sun, 19 Apr 2026 14:05:00 +0200</pubDate><guid>https://hugocisneros.com/notes/image_segmentation/</guid><description>tags Computer vision, Image processing, Object recognition, Foundation models Predicting per-pixel labels (instance, semantic, panoptic) for an image, including the open-vocabulary regime where target classes are specified by text or visual exemplars at inference time.</description></item><item><title>Reward hacking</title><link>https://hugocisneros.com/notes/reward_hacking/</link><pubDate>Sun, 19 Apr 2026 14:04:00 +0200</pubDate><guid>https://hugocisneros.com/notes/reward_hacking/</guid><description>tags Reinforcement learning, Reinforcement learning with verifiable rewards, GRPO Pathologies where agents exploit literal reward structure (e.g., spamming tool use without accuracy gains).</description></item><item><title>Reward shaping</title><link>https://hugocisneros.com/notes/reward_shaping/</link><pubDate>Sun, 19 Apr 2026 14:04:00 +0200</pubDate><guid>https://hugocisneros.com/notes/reward_shaping/</guid><description>tags Reinforcement learning, Reinforcement learning with verifiable rewards, GRPO Patterns for designing RL rewards that produce intended behaviors (e.g., conditional bonuses that trigger only on correct outcomes).</description></item><item><title>Chain-of-Thought reasoning</title><link>https://hugocisneros.com/notes/chain_of_thought_reasoning/</link><pubDate>Sun, 19 Apr 2026 14:03:00 +0200</pubDate><guid>https://hugocisneros.com/notes/chain_of_thought_reasoning/</guid><description>tags LLM, Test-time compute, Reinforcement learning, Token-level credit assignment in reasoning traces Prompting and training paradigm where models emit intermediate reasoning steps before a final answer, improving multi-step problem solving and enabling RL on verifiable outcomes.</description></item><item><title>Geospatial AI</title><link>https://hugocisneros.com/notes/geospatial_ai/</link><pubDate>Sun, 19 Apr 2026 14:00:00 +0200</pubDate><guid>https://hugocisneros.com/notes/geospatial_ai/</guid><description>tags Machine learning, LLM, Agent, Computer vision, Retrieval augmented generation The application of AI and LLMs to geospatial data like satellite imagery, GIS databases, mapping, and Earth observation. LLMs have become production-grade tools across the geospatial industry in 2024–2026.
Enterprise GIS + LLMs The dominant deployment pattern (as of early 2026) is agentic orchestration: LLMs serve as reasoning layers that invoke specialized GIS tools, spatial databases, and domain-specific Foundation models rather than embedding all geospatial knowledge in model weights.</description></item><item><title>Visual question answering</title><link>https://hugocisneros.com/notes/visual_question_answering/</link><pubDate>Sun, 19 Apr 2026 13:55:00 +0200</pubDate><guid>https://hugocisneros.com/notes/visual_question_answering/</guid><description>tags Vision Language Models, Spatial Reasoning, Grounding Task of answering natural-language questions grounded in image content, spanning global scene understanding to fine-grained perception and compositional reasoning</description></item><item><title>Coding agent</title><link>https://hugocisneros.com/notes/coding_agent/</link><pubDate>Sun, 19 Apr 2026 13:54:00 +0200</pubDate><guid>https://hugocisneros.com/notes/coding_agent/</guid><description>tags Machine learning, Program synthesis, Agent A LLM agent (based on a generic of specialized model) dedicated to programming.
One popular way of training effective coding agents is to do agentic reinforcement learning</description></item><item><title>Notes on: Meta-Harness: End-to-End Optimization of Model Harnesses by Lee, Y., Nair, R., Zhang, Q., Lee, K., Khattab, O., &amp; Finn, C. (2026)</title><link>https://hugocisneros.com/notes/leemetaharnessendtoend2026/</link><pubDate>Sun, 19 Apr 2026 13:53:00 +0200</pubDate><guid>https://hugocisneros.com/notes/leemetaharnessendtoend2026/</guid><description>tags Machine learning, Optimization, Meta-learning, Program synthesis source (Lee et al. 2026) Summary This paper introduces Meta-Harness, an outer-loop system for automatically optimizing the &amp;ldquo;harness&amp;rdquo; of LLM applications &amp;mdash; the code that determines what information to store, retrieve, and present to the model at each step. The key insight is that harness design matters as much as model weights: changing only the harness around a fixed LLM can produce a 6x performance gap on the same benchmark.</description></item><item><title>Agentic reinforcement learning</title><link>https://hugocisneros.com/notes/agentic_reinforcement_learning/</link><pubDate>Sun, 19 Apr 2026 13:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/agentic_reinforcement_learning/</guid><description>tags Reinforcement learning, Reinforcement learning with verifiable rewards, GRPO, Tool calling RL post-training for LLM/VLM agents that decide when and how to invoke tools during reasoning, with rewards shaped around tool-use policy (necessity, efficiency, trajectory geometry) rather than just final-answer correctness</description></item><item><title>Multimodal reasoning</title><link>https://hugocisneros.com/notes/multimodal_reasoning/</link><pubDate>Sun, 19 Apr 2026 13:42:00 +0200</pubDate><guid>https://hugocisneros.com/notes/multimodal_reasoning/</guid><description>tags Vision Language Models, Tool calling, Spatial Reasoning, Grounding Multimodal reasoning paradigm where VLMs interleave textual chain-of-thought with active visual operations (zoom, crop, search) to acquire evidence on demand.</description></item><item><title>Foundation models</title><link>https://hugocisneros.com/notes/foundation_models/</link><pubDate>Sun, 19 Apr 2026 13:40:00 +0200</pubDate><guid>https://hugocisneros.com/notes/foundation_models/</guid><description>tags Machine learning, Self-supervised learning Big models trained on a large amount of domain data to learn useful patterns and serve as a basis for other downstream machine learning applications. These models usually operate at the frontier fo scaling laws of compute+data vs. performance.
These models can serve as bases for Transfer learning.
Some common classes of foundation models are:
LLM Vision transformer Vision Language Models Domain-specific Foundation Models Beyond language and vision, foundation models have been developed for specific domains where large-scale pre-training on domain data provides transferable representations.</description></item><item><title>Switch transformer</title><link>https://hugocisneros.com/notes/switch_transformer/</link><pubDate>Sun, 19 Apr 2026 13:39:00 +0200</pubDate><guid>https://hugocisneros.com/notes/switch_transformer/</guid><description> tags Transformers, T5, NLP paper (Fedus et al. 2022) Architecture This model increases the parameter count of T5-like architecture while allowing efficient routing through different experts in a mixture of experts.
Parameter count 1T
Bibliography William Fedus, Barret Zoph, Noam Shazeer. June 16, 2022. "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity". June 16, 2022DOI.</description></item><item><title>Notes on: Perception Encoder: The best visual embeddings are not at the output of the network by Daniel Bolya, Po-Yao Huang, Peize Sun, Jang Hyun Cho, Andrea Madotto, Chen Wei, Tengyu Ma, Jiale Zhi, Jathushan Rajasegaran, Hanoona Rasheed, Junke Wang, Marco Monteiro, Hu Xu, Shiyu Dong, Nikhila Ravi, Daniel Li, Piotr Dollár, Christoph Feichtenhofer (2025)</title><link>https://hugocisneros.com/notes/bolyaperceptionencoder2025/</link><pubDate>Sun, 19 Apr 2026 11:50:00 +0200</pubDate><guid>https://hugocisneros.com/notes/bolyaperceptionencoder2025/</guid><description>tags Vision Language Models, Computer vision, CLIP, Contrastive learning, Vision transformer, Foundation models source (Bolya et al. 2025) Summary This paper introduces Perception Encoder (PE), a family of vision encoders from Meta FAIR trained with a purely global CLIP-style contrastive vision-language objective that nonetheless produces state-of-the-art features for tasks as diverse as zero-shot classification, MLLM-based Q&amp;amp;A, grounding, object detection, tracking, and depth estimation. The central empirical finding is that general-purpose features already exist inside a well-trained CLIP model — but they live in intermediate layers, not at the output.</description></item><item><title>Notes on: V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning by Lorenzo Mur-Labadia, Matthew Muckley, Amir Bar, Mido Assran, Koustuv Sinha, Mike Rabbat, Yann LeCun, Nicolas Ballas, Adrien Bardes (2026)</title><link>https://hugocisneros.com/notes/murlabadiavjepa2026/</link><pubDate>Sun, 19 Apr 2026 11:49:00 +0200</pubDate><guid>https://hugocisneros.com/notes/murlabadiavjepa2026/</guid><description>tags Self-supervised learning, Vision transformer, Foundation models, Robotics source (Mur-Labadia et al. 2026) Summary V-JEPA 2.1 is a family of self-supervised video models (ViT-g/G, 1B/2B, plus distilled ViT-L/B variants) from FAIR at Meta that extends the Joint-Embedding Predictive Architecture (JEPA) line to produce representations that are simultaneously strong on dense spatio-temporal tasks (segmentation, tracking, depth, action anticipation) and global understanding tasks (action and image recognition). The paper&amp;rsquo;s central empirical observation is that when the masked-prediction loss is applied only to masked tokens (as in V-JEPA 2), the encoder has no incentive to encode fine-grained local structure in context tokens, so they collapse into register-like global aggregators and dense downstream performance suffers.</description></item><item><title>Notes on: End-to-End Object Detection with Transformers by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko (2020)</title><link>https://hugocisneros.com/notes/carionendtoendobject2020/</link><pubDate>Sun, 19 Apr 2026 11:28:00 +0200</pubDate><guid>https://hugocisneros.com/notes/carionendtoendobject2020/</guid><description>tags Transformers, Attention, Computer vision, Object recognition, Positional encoding source (Carion et al. 2020) Summary DETR (DEtection TRansformer) reframes object detection as a direct set prediction problem, eliminating hand-designed components that traditional detectors rely on: anchor generation, non-maximum suppression (NMS), and coordinate-regression heuristics against proposals. The model is a CNN backbone (ResNet-50 / ResNet-101) feeding a standard Transformer encoder-decoder. The decoder attends to the encoder output via a small fixed set of \(N\) learned object queries (with \(N\) much larger than the number of objects in any image) and, in parallel, produces \(N\) (class, box) predictions through a shared feed-forward head.</description></item><item><title>Notes on: SAM 3: Segment Anything with Concepts by Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Baishan Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Triantafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Liliane Momeni, Rishi Hazra, Shuangrui Ding, Sagar Vaze, Francois Porcher, Feng Li, Siyuan Li, Aishwarya Kamath, Ho Kei Cheng, Piotr Dollár, Nikhila Ravi, Kate Saenko, Pengchuan Zhang, Christoph Feichtenhofer (2025)</title><link>https://hugocisneros.com/notes/carionsamsegmentanything2025/</link><pubDate>Sun, 19 Apr 2026 11:28:00 +0200</pubDate><guid>https://hugocisneros.com/notes/carionsamsegmentanything2025/</guid><description>tags Computer vision, Foundation models, Object recognition, Vision Language Models, Grounding, Synthetic training data source (Carion et al. 2025) Summary SAM 3 (Segment Anything Model 3) is Meta&amp;rsquo;s third installment of the SAM family of Foundation models for Computer vision. The headline contribution is a new task — Promptable Concept Segmentation (PCS) — which generalizes the SAM 1/2 Promptable Visual Segmentation (PVS) task from &amp;ldquo;segment one object given a click/box/mask&amp;rdquo; to &amp;ldquo;segment, identify, and track every instance of a visual concept given a short noun phrase, an image exemplar, or both, across an image or short video&amp;rdquo;.</description></item><item><title>Attention</title><link>https://hugocisneros.com/notes/attention/</link><pubDate>Sun, 12 Apr 2026 10:44:00 +0200</pubDate><guid>https://hugocisneros.com/notes/attention/</guid><description>tags Neural networks, Transformers Implementation Self-attention is a weighted average of all input elements from a sequence, with a weight proportional to a similarity score between representations. The input \(x \in \mathbb{R}^{L \times F}\) is projected by matrices \(W_Q \in \mathbb{R}^{F \times D}\), \(W_K \in \mathbb{R}^{F\times D}\) and \(W_V \in \mathbb{R}^{F\times M}\) to representations \(Q\) (queries), \(K\) (keys) and \(V\) (values).
\[ Q = xW_Q\] \[ K = xW_K\] \[ V = xW_V\]</description></item><item><title>Diffusion language models</title><link>https://hugocisneros.com/notes/diffusion_language_models/</link><pubDate>Sun, 12 Apr 2026 10:33:00 +0200</pubDate><guid>https://hugocisneros.com/notes/diffusion_language_models/</guid><description>tags LLM, Diffusion models, Language modeling, Transformers Language model architecture that use diffusion instead of autoregression. They generate text by iteratively denoising masked or noised tokens, which enables parallel decoding.</description></item><item><title>Notes on: DFlash: Block Diffusion for Flash Speculative Decoding by Jian Chen, Yesheng Liang, Zhijian Liu (2026)</title><link>https://hugocisneros.com/notes/chendflashblockdiffusion2026/</link><pubDate>Sun, 12 Apr 2026 10:33:00 +0200</pubDate><guid>https://hugocisneros.com/notes/chendflashblockdiffusion2026/</guid><description>tags LLM, Diffusion models, Transformers, Test-time compute source (Chen et al. 2026) Summary DFlash is a speculative decoding framework that replaces the usual small autoregressive draft model with a lightweight block diffusion draft model. The draft model generates a whole block of tokens in a single forward pass, which are then verified in parallel by the target LLM. The authors argue that the main bottleneck of prior speculative decoding methods (e.</description></item><item><title>Generative modelling</title><link>https://hugocisneros.com/notes/generative_modelling/</link><pubDate>Sun, 12 Apr 2026 10:31:00 +0200</pubDate><guid>https://hugocisneros.com/notes/generative_modelling/</guid><description> tags Machine learning</description></item><item><title>Speculative Decoding</title><link>https://hugocisneros.com/notes/speculative_decoding/</link><pubDate>Sun, 12 Apr 2026 10:31:00 +0200</pubDate><guid>https://hugocisneros.com/notes/speculative_decoding/</guid><description>tags LLM, Optimization, Test-time compute, Language modeling The principle of speculative decoding for LLMs is based on the asymmetry of Transformer-based architecture between:
decoding tokens one by one, resulting in individual full passes through the model verifying multiple tokens at once, which results in only one full pass on a slightly longer sequence In speculative decoding, a large expensive model is augmented with a cheap draft model, that will generate draft tokens directly.</description></item><item><title>Grounding</title><link>https://hugocisneros.com/notes/grounding/</link><pubDate>Thu, 09 Apr 2026 17:40:00 +0200</pubDate><guid>https://hugocisneros.com/notes/grounding/</guid><description> tags Machine learning, LLM, Agent, Evaluating NLP</description></item><item><title>Model Context Protocol</title><link>https://hugocisneros.com/notes/model_context_protocol/</link><pubDate>Thu, 09 Apr 2026 17:28:00 +0200</pubDate><guid>https://hugocisneros.com/notes/model_context_protocol/</guid><description>tags Agent, Coding agent, Multi-agent collaboration An open protocol (originally from Anthropic) that standardizes how AI agents connect to external tools and data sources. MCP defines a client-server architecture where agents (clients) invoke capabilities exposed by tool servers through a standard interface, similar to how USB-C standardizes device connectivity. It is closely related to Tool calling.
Adoption in Geospatial As of early 2026, MCP is rapidly becoming the standard connector between AI agents and GIS tools.</description></item><item><title>3-SAT</title><link>https://hugocisneros.com/notes/3_sat/</link><pubDate>Thu, 09 Apr 2026 14:48:00 +0200</pubDate><guid>https://hugocisneros.com/notes/3_sat/</guid><description> tags Logic</description></item><item><title>Token-level credit assignment in reasoning traces</title><link>https://hugocisneros.com/notes/token_credit_assignment/</link><pubDate>Thu, 09 Apr 2026 14:48:00 +0200</pubDate><guid>https://hugocisneros.com/notes/token_credit_assignment/</guid><description>tags Reinforcement learning, Distillation, Self-training Three early 2026 papers (MiniMax-M1 (CISPO), Zhang et al. (SSD), and Hübotter et al. (SDPO)) converge on a shared structural observation: not all tokens in a reasoning trace are equally important for learning, and naive uniform treatment of tokens is a core failure mode of current training methods.
The fork/filler distinction All three papers implicitly or explicitly distinguish between two kinds of positions in generated sequences:</description></item><item><title>GRPO</title><link>https://hugocisneros.com/notes/grpo/</link><pubDate>Thu, 09 Apr 2026 14:37:00 +0200</pubDate><guid>https://hugocisneros.com/notes/grpo/</guid><description> tags Reinforcement learning, Algorithm, Machine learning</description></item><item><title>PPO</title><link>https://hugocisneros.com/notes/ppo/</link><pubDate>Thu, 09 Apr 2026 14:37:00 +0200</pubDate><guid>https://hugocisneros.com/notes/ppo/</guid><description> tags Reinforcement learning, Algorithm, Machine learning</description></item><item><title>Notes on: MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention by MiniMax (2025)</title><link>https://hugocisneros.com/notes/minimaxscalingtesttimecompute2025/</link><pubDate>Thu, 09 Apr 2026 14:21:00 +0200</pubDate><guid>https://hugocisneros.com/notes/minimaxscalingtesttimecompute2025/</guid><description>tags Foundation models, Reinforcement learning, Transformers, Scaling laws source (MiniMax 2025) Summary MiniMax-M1 is the first open-weight, large-scale reasoning model built on a hybrid attention architecture combining Transformers with lightning attention (a linear attention variant). The model uses a Mixture-of-Experts (MoE) design with 456 billion total parameters (45.9B activated per token) and natively supports 1 million token context length — 8x that of DeepSeek R1. The hybrid design alternates softmax attention blocks with linear attention (transnormer) blocks in a 1:7 ratio, enabling near-linear scaling of inference FLOPs with generation length.</description></item><item><title>Notes on: Attention Residuals by Kimi Team, Guangyu Chen, Yu Zhang, Jianlin Su et al. (2026)</title><link>https://hugocisneros.com/notes/chenattentionresiduals2026/</link><pubDate>Wed, 08 Apr 2026 18:16:00 +0200</pubDate><guid>https://hugocisneros.com/notes/chenattentionresiduals2026/</guid><description>tags Transformers, LLM, Scaling laws, Attention, Residual neural networks source (Chen et al. 2026) Summary Standard residual connections in modern LLMs accumulate all layer outputs with fixed unit weights via PreNorm, causing uncontrolled hidden-state growth with depth and progressively diluting each layer&amp;rsquo;s contribution. This paper proposes Attention Residuals (AttnRes), which replaces this fixed accumulation with softmax attention over preceding layer outputs. Each layer selectively aggregates earlier representations using learned, input-dependent weights computed from a single pseudo-query vector per layer.</description></item><item><title>Linear Attention</title><link>https://hugocisneros.com/notes/linear_attention/</link><pubDate>Wed, 08 Apr 2026 18:14:00 +0200</pubDate><guid>https://hugocisneros.com/notes/linear_attention/</guid><description>tags Attention, Transformers, Machine learning, Applied maths Attention variants that replace softmax with linear kernels, reducing complexity from quadratic to linear in sequence lengt</description></item><item><title>Notes on: Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention by Katharopoulos, A., Vyas, A., Pappas, N., &amp; Fleuret, F. (2020)</title><link>https://hugocisneros.com/notes/katharopoulostransformersarernns2020/</link><pubDate>Wed, 08 Apr 2026 18:14:00 +0200</pubDate><guid>https://hugocisneros.com/notes/katharopoulostransformersarernns2020/</guid><description>tags Transformers, RNN source (Katharopoulos et al. 2020) Summary Transformers have traditionally been described as different models from RNNs. This is because instead of processing the sequence one token at a time, Transformers use attention to process all elements simultaneously.
The paper introduces an interesting new formulation, replacing the softmax attention with a feature map-based dot product.
This new formulation yields better time and memory complexity as well as a model that is casual and autoregressive (similar to RNNs).</description></item><item><title>Mixture of Experts</title><link>https://hugocisneros.com/notes/mixture_of_experts/</link><pubDate>Wed, 08 Apr 2026 18:11:00 +0200</pubDate><guid>https://hugocisneros.com/notes/mixture_of_experts/</guid><description>tags Transformers, LLM, Machine learning, Scaling laws Sparse neural network architecture that routes inputs to a subset of expert subnetworks, enabling parameter scaling without proportional compute increase.</description></item><item><title>Notes on: Embarrassingly Simple Self-Distillation Improves Code Generation by Zhang, R., Bai, R. H., Zheng, H., Jaitly, N., Collobert, R., &amp; Zhang, Y. (2026)</title><link>https://hugocisneros.com/notes/zhangembarrassinglysimpleselfdistillation2026/</link><pubDate>Wed, 08 Apr 2026 14:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/zhangembarrassinglysimpleselfdistillation2026/</guid><description>tags Distillation, Language modeling, Program synthesis, Large language models source (Zhang et al. 2026) Summary This paper introduces simple self-distillation (SSD), a method where an LLM improves its own code generation by sampling solutions from itself with specific temperature and truncation settings, then fine-tuning on those raw, unverified samples using standard supervised fine-tuning (cross-entropy loss).
Crucially, SSD requires no external teacher model, no verifier (as opposed to Reinforcement learning with verifiable rewards), no execution environment, no reward model, and no reinforcement learning.</description></item><item><title>Test-time compute</title><link>https://hugocisneros.com/notes/test_time_compute/</link><pubDate>Wed, 08 Apr 2026 13:55:00 +0200</pubDate><guid>https://hugocisneros.com/notes/test_time_compute/</guid><description> tags Machine learning, LLM, Reinforcement learning</description></item><item><title>Reinforcement learning with verifiable rewards</title><link>https://hugocisneros.com/notes/reinforcement_learning_with_verifiable_rewards/</link><pubDate>Wed, 08 Apr 2026 13:53:00 +0200</pubDate><guid>https://hugocisneros.com/notes/reinforcement_learning_with_verifiable_rewards/</guid><description>tags Machine learning, Reinforcement learning, LLM This is related to RLHF, but instead of relying on human scoring of outputs, it uses programmatically verifiable outcomes (such as unit tests for code, math proofs, etc.).</description></item><item><title>Self-training</title><link>https://hugocisneros.com/notes/self_training/</link><pubDate>Wed, 08 Apr 2026 13:47:00 +0200</pubDate><guid>https://hugocisneros.com/notes/self_training/</guid><description>tags Machine learning, Distillation, Language modeling, LLM Implications for open-ended evolution Looking at self training through the lense of Open-ended Evolution, it feels like pure self training (only a model with itself) cannot lead to radical improvement or novel behavior since it is fundamentally limited by the original distribution that the model is capable of modeling.
Only external inputs (harness, environment, etc.) can lead the model to change its output distribution significantly enough, whereas self-learning can help with refining an existing distribution, by pruning tails, or increasing the strength of certain subspaces for example.</description></item><item><title>Synthetic training data</title><link>https://hugocisneros.com/notes/synthetic_training_data/</link><pubDate>Tue, 07 Apr 2026 19:26:00 +0200</pubDate><guid>https://hugocisneros.com/notes/synthetic_training_data/</guid><description>tags Machine learning, LLM, Transfer learning, Neural network training A way of self-training is to generate synthetic labels/data that are then used to retrain the model. This is for instance what happens in &amp;lt;&amp;amp;zhangEmbarrassinglySimpleSelfDistillation2026&amp;gt;.</description></item><item><title>Text embeddings</title><link>https://hugocisneros.com/notes/text_embeddings/</link><pubDate>Tue, 07 Apr 2026 19:23:00 +0200</pubDate><guid>https://hugocisneros.com/notes/text_embeddings/</guid><description>tags Machine learning, NLP Dense vector representations of text for retrieval, similarity, and classification tasks.
Matching embeddings in vector space can be optimized for high Semantic similarity.
Learning text embeddings Transformers have been used to create text embeddings, using encoder-decoded architecture in particular, such as BERT.
Contrastive learning is one of the dominant training method for producing high-quality text embeddings (SimCSE, E5, BGE).</description></item><item><title>In-context learning</title><link>https://hugocisneros.com/notes/in_context_learning/</link><pubDate>Tue, 07 Apr 2026 19:21:00 +0200</pubDate><guid>https://hugocisneros.com/notes/in_context_learning/</guid><description>tags Machine learning LLM, Meta-learning In-context learning is the mechanism through which LLMs perform few-shot learning without gradient updates.</description></item><item><title>Scaling laws</title><link>https://hugocisneros.com/notes/scaling_laws/</link><pubDate>Tue, 07 Apr 2026 19:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/scaling_laws/</guid><description>tags Machine learning, LLM, The Scaling Hypothesis Scaling laws inform the training and scaling of the largest models.</description></item><item><title>Contrastive learning</title><link>https://hugocisneros.com/notes/contrastive_learning/</link><pubDate>Tue, 07 Apr 2026 19:17:00 +0200</pubDate><guid>https://hugocisneros.com/notes/contrastive_learning/</guid><description>tags NLP, Retrieval augmented generation, Semantic similarity Contrastive learning is one of the dominant paradigms for Self-supervised learning.</description></item><item><title>Tool calling</title><link>https://hugocisneros.com/notes/tool_calling/</link><pubDate>Tue, 07 Apr 2026 19:12:00 +0200</pubDate><guid>https://hugocisneros.com/notes/tool_calling/</guid><description> tags Agent, LLM, Model Context Protocol</description></item><item><title>Semantic similarity</title><link>https://hugocisneros.com/notes/semantic_similarity/</link><pubDate>Tue, 07 Apr 2026 16:02:00 +0200</pubDate><guid>https://hugocisneros.com/notes/semantic_similarity/</guid><description>tags NLP, Evaluating NLP N-gram matching For two sequences \(x\) and \(\hat{x}\), we denote the sequence of $n$-grams with \(S_x^n\) and \(S^n_{\hat{x}}\). The number of matched $n$-grams between the two sentences is: \[ \sum_{w \in S_{\hat{x}}^n} \mathbb{I}[w \in S_{x}^n ] \] with \(\mathbb{I}\) the indicator function.
From this we can construct the exact match precision (Exact-\(P_n\)) and recall (Exact-\(R_n\)): \[ \text{Exact-}$P_n$ = \frac{\sum_{w \in S_{\hat{x}}^n} \mathbb{I}[w \in S_{x}^n ]}{| S_{\hat{x}}^n|} \] and \[ \text{Exact-}$R_n$ = \frac{\sum_{w \in S_{x}^n} \mathbb{I}[w \in S_{\hat{x}}^n ]}{| S_{x}^n|} \]</description></item><item><title>Self-supervised learning</title><link>https://hugocisneros.com/notes/self_supervised_learning/</link><pubDate>Tue, 07 Apr 2026 16:00:00 +0200</pubDate><guid>https://hugocisneros.com/notes/self_supervised_learning/</guid><description>tags Machine learning Definition Self supervised learning (SSL) is a learning paradigm based on the idea of using information contained within the training data to build better representations of it. Self-supervised models are usually trained to predict hidden parts of the input data from its visible parts.
SSL in NLP Self-supervised learning has been used for a long time in NLP. In Language modeling, one tries to predict words from previous ones.</description></item><item><title>Spatial Reasoning</title><link>https://hugocisneros.com/notes/spatial_reasoning/</link><pubDate>Tue, 07 Apr 2026 15:58:00 +0200</pubDate><guid>https://hugocisneros.com/notes/spatial_reasoning/</guid><description>tags Geospatial AI, LLM, Evaluating NLP, In-context learning The ability to reason about spatial relationships, compose spatial information into coherent mental maps, and perform geometric reasoning. Spatial reasoning remains a fundamental capability gap for LLMs (as of mid-2025 benchmarks).
LLM Limitations GeoGramBench (May 2025): LLMs exceed 80% on local primitive recognition but never surpass 50% on global abstract integration—they cannot compose piecemeal spatial information into coherent mental maps. On RCC-8 topological relations, LLMs mislabel &amp;ldquo;disjoint&amp;rdquo; as &amp;ldquo;overlaps&amp;rdquo; ~80% of the time.</description></item><item><title>Notes on: CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery by Ao Qu, Han Zheng, Zijian Zhou, Yihao Yan, Yihong Tang, Shao Yong Ong, Fenglu Hong, Kaichen Zhou, Chonghe Jiang, Minwei Kong, Jiacheng Zhu, Xuan Jiang, Sirui Li, Cathy Wu, Bryan Kian Hsiang Low, Jinhua Zhao, Paul Pu Liang (2026)</title><link>https://hugocisneros.com/notes/qucoralautonomousmultiagent2026/</link><pubDate>Tue, 07 Apr 2026 11:15:00 +0200</pubDate><guid>https://hugocisneros.com/notes/qucoralautonomousmultiagent2026/</guid><description>tags Open-ended evolution, Co-evolution, Quality diversity, Genetic algorithms, Coding agent, LLM source (Qu et al. 2026) Summary CORAL is a framework for autonomous multi-agent evolution applied to open-ended discovery problems. Unlike prior LLM-driven evolutionary search systems (FunSearch, AlphaEvolve, OpenEvolve) that rely on fixed heuristics for parent selection, mutation, and population management, CORAL delegates these decisions to autonomous LLM agents. Each agent operates in an isolated workspace, iteratively proposing, evaluating, and refining candidate solutions while reading from and writing to a shared persistent memory system.</description></item><item><title>Multi-agent collaboration</title><link>https://hugocisneros.com/notes/multi_agent_collaboration/</link><pubDate>Tue, 07 Apr 2026 11:13:00 +0200</pubDate><guid>https://hugocisneros.com/notes/multi_agent_collaboration/</guid><description> tags LLM, Coding agent, Co-evolution</description></item><item><title>Agent</title><link>https://hugocisneros.com/notes/agent/</link><pubDate>Tue, 07 Apr 2026 11:12:00 +0200</pubDate><guid>https://hugocisneros.com/notes/agent/</guid><description> tags LLM</description></item><item><title>Notes on: Training Language Models via Neural Cellular Automata by Dan Lee, Seungwook Han, Akarsh Kumar, Pulkit Agrawal (2026)</title><link>https://hugocisneros.com/notes/leetraininglanguagemodels2026/</link><pubDate>Tue, 07 Apr 2026 11:11:00 +0200</pubDate><guid>https://hugocisneros.com/notes/leetraininglanguagemodels2026/</guid><description>tags Cellular automata, Neural cellular automata, Transfer learning, Kolmogorov complexity source (Lee et al. 2026) Summary This paper proposes using neural cellular automata (NCA) as a source of synthetic, non-linguistic data for pre-pre-training large language models: an initial training phase on synthetic data that precedes standard pre-training on natural language corpora.
The authors use 2D discrete NCA on a 12x12 grid with 10 states, where the transition rule is parameterized by a randomly sampled neural network (3x3 convolution + MLP).</description></item><item><title>Notes on: Gecko: Versatile Text Embeddings Distilled from Large Language Models by Jinhyuk Lee, et al. (2024)</title><link>https://hugocisneros.com/notes/leegeckoversatiletext2024/</link><pubDate>Tue, 07 Apr 2026 10:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/leegeckoversatiletext2024/</guid><description>tags NLP, LLM, Semantic similarity source (Lee et al. 2024) Summary Gecko is a compact text embedding model (1.2B parameters) from Google DeepMind that achieves strong retrieval performance by distilling knowledge from LLMs into a retriever. The key innovation is a two-step LLM distillation process called FRet (Few-shot prompted Retrieval). First, an LLM generates diverse synthetic task descriptions and queries from sampled web passages. Second, the LLM refines data quality by reranking retrieved candidate passages using two scoring functions (query likelihood and relevance classification), selecting better positive and hard negative passages than the original seed passages.</description></item><item><title>Knowledge distillation</title><link>https://hugocisneros.com/notes/knowledge_distillation/</link><pubDate>Tue, 07 Apr 2026 10:18:00 +0200</pubDate><guid>https://hugocisneros.com/notes/knowledge_distillation/</guid><description/></item><item><title>Complexity metrics</title><link>https://hugocisneros.com/notes/complexity_metrics/</link><pubDate>Sun, 05 Apr 2026 17:07:00 +0200</pubDate><guid>https://hugocisneros.com/notes/complexity_metrics/</guid><description>tags Complexity To study the complexity of various systems, researchers have come up with various metrics. They are based on several principles such as Information theory or Algorithmic Information theory. Many of these metrics are described in (Grassberger 1989).
Shannon entropy and Kolmogorov Complexity The paper (Grunwald, Vitányi 2004) is a great description and analysis of two of the most important Complexity metrics:
Shannon entropy Kolmogorov complexity Information-theoretic metrics Shannon entropy AIT based metrics For a Universal computer \(U\) the algorithmic information of \(S\) relative to \(U\) is defined as the length of the shortest program that yields \(S\) on \(U\).</description></item><item><title>LLM</title><link>https://hugocisneros.com/notes/llm/</link><pubDate>Sun, 05 Apr 2026 17:07:00 +0200</pubDate><guid>https://hugocisneros.com/notes/llm/</guid><description>tags Machine learning, NLP, Language modeling A large language model, which is the name now given mostly to large Transformer based architecture with usually at least several billions of parameters Retrieval augmented generation</description></item><item><title>Retrieval augmented generation</title><link>https://hugocisneros.com/notes/retrieval_augmented_generation/</link><pubDate>Sun, 05 Apr 2026 14:42:00 +0200</pubDate><guid>https://hugocisneros.com/notes/retrieval_augmented_generation/</guid><description>tags Machine learning, LLM RAG stands for &amp;ldquo;Retrieval augmented generation&amp;rdquo;, which is a type of LLM Harness for conditioning the generated text based on data accessed in a database using a query.</description></item><item><title>Notes on: Large Language Models as Optimizers by Yang, C., Wang, X., Lu, Y., Liu, H., Le, Q. V., Zhou, D., &amp; Chen, X. (2023)</title><link>https://hugocisneros.com/notes/yanglargelanguagemodels2023/</link><pubDate>Sun, 05 Apr 2026 12:00:00 +0200</pubDate><guid>https://hugocisneros.com/notes/yanglargelanguagemodels2023/</guid><description>tags Optimization, Machine learning, Natural language processing source (yanglargelanguagemodels2023?) Summary This paper proposes Optimization by PROmpting (OPRO), a framework that uses large language models as general-purpose optimizers. Rather than relying on formal problem specifications or gradient-based methods, OPRO describes optimization tasks in natural language and leverages the LLM&amp;rsquo;s ability to recognize patterns from previously evaluated solutions to iteratively propose better ones.
The core mechanism is a meta-prompt that contains two components: (1) the optimization problem description with task exemplars, and (2) an optimization trajectory of previously generated solutions paired with their objective values, sorted in ascending order.</description></item><item><title>LLM Harness</title><link>https://hugocisneros.com/notes/llm_harness/</link><pubDate>Sat, 04 Apr 2026 23:40:00 +0200</pubDate><guid>https://hugocisneros.com/notes/llm_harness/</guid><description>tags Machine learning, Language modeling It is the scaffolding code (prompts, retrieval, memory, context management) that wraps an LLM and determines what information it sees.</description></item><item><title>Q: What do the notes say about geospatial analysis?</title><link>https://hugocisneros.com/notes/2026-04-04-geospatial-analysis/</link><pubDate>Sat, 04 Apr 2026 00:00:00 +0200</pubDate><guid>https://hugocisneros.com/notes/2026-04-04-geospatial-analysis/</guid><description>Answer The knowledge base does not contain notes specifically about geospatial analysis, GIS, remote sensing, satellite imagery, or related topics. A search across all ~490 notes found no entries covering these subjects.
The term &amp;ldquo;spatial&amp;rdquo; appears in several notes, but always in different contexts:
Spatial priors in neural networks: CNN and related notes discuss spatial priors like translation invariance in deep learning. Spatial organization in biology: Talk: Alife 2020 keynote Michael Levin - Robot Cancer mentions 3D spatial organisation of cells.</description></item><item><title>Giffen goods</title><link>https://hugocisneros.com/notes/giffen_goods/</link><pubDate>Tue, 29 Oct 2024 08:26:00 +0100</pubDate><guid>https://hugocisneros.com/notes/giffen_goods/</guid><description> tags Economics</description></item><item><title>Automation</title><link>https://hugocisneros.com/notes/automation/</link><pubDate>Wed, 10 May 2023 20:45:00 +0200</pubDate><guid>https://hugocisneros.com/notes/automation/</guid><description> tags Economics, Artificial Intelligence Labor and automation resources Interview with Juan Sebastian Carbonell, Interview with Aaron Benanav</description></item><item><title>Quantization</title><link>https://hugocisneros.com/notes/quantization/</link><pubDate>Mon, 01 May 2023 08:18:00 +0200</pubDate><guid>https://hugocisneros.com/notes/quantization/</guid><description>tags Computer science, Neural networks The goal of quantization in neural network training is to make neural networks more efficient by simplifying their computations. This is done by replacing floating point operations by operations on smaller number types (quantization of the parameters). The goal of quantization is to preserve the accuracy of the model while doing this conversion.
Quantization of large language models The LLM.int8() paper (Dettmers et al. 2022) explains some interesting issues and solutions for quantization of transformer-based large language models.</description></item><item><title>Universal basic income</title><link>https://hugocisneros.com/notes/universal_basic_income/</link><pubDate>Mon, 10 Apr 2023 22:31:00 +0200</pubDate><guid>https://hugocisneros.com/notes/universal_basic_income/</guid><description>tags Economics, Economic liberalism Definition Haagh defines UBI as the desire to ‘give all residents a modest regular income grant that is not dependent on means-tests or work-requirements’ (Haagh 2019).
Some critics of UBI In (Harris 2023), the author argues that rather than disrupting capitalism, UBI implementations risk reinforcing the modalities of thought that neoliberalism uses to govern.
Bibliography Louise Haagh. 2019. The Case for Universal Basic Income. Polity. Neal Harris.</description></item><item><title>Gopher</title><link>https://hugocisneros.com/notes/gopher/</link><pubDate>Wed, 22 Feb 2023 13:28:00 +0100</pubDate><guid>https://hugocisneros.com/notes/gopher/</guid><description> tags Transformers, GPT paper (Rae et al. 2022) Architecture This model is very similar to GPT-2 but uses RSNorm instead of LayerNorm and relative positional encoding rather than absolute positional encoding.
Parameter count 280B
Bibliography Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, et al.. January 21, 2022. "Scaling Language Models: Methods, Analysis &amp; Insights from Training Gopher". January 21, 2022DOI.</description></item><item><title>Online privacy</title><link>https://hugocisneros.com/notes/online_privacy/</link><pubDate>Wed, 22 Feb 2023 13:28:00 +0100</pubDate><guid>https://hugocisneros.com/notes/online_privacy/</guid><description>tags Privacy IP Addresses In (Mishra et al. 2020), the authors analyze a set of users&amp;rsquo; internet traffic for more than 100 days. They observed a little more than 11% of the 34,488 IP addresses they collected were present for more than a month. Many of them were reused throughout the whole experience, making long-term tracking of users possible.
The study also shows that 93% of users had a unique fixed set of IP addresses during the whole experiment, making it easy to track them between home, work, etc.</description></item><item><title>Sparrow</title><link>https://hugocisneros.com/notes/sparrow/</link><pubDate>Wed, 22 Feb 2023 13:28:00 +0100</pubDate><guid>https://hugocisneros.com/notes/sparrow/</guid><description>tags Transformers, GPT, Chinchilla paper (Glaese et al. 2022) blog post Deepmind announcement blog post Architecture Starts from the Chinchilla 70B model but adds RLHF (Reinforcement Learning with Human Feedback). It also adds inline evidence like GopherCite.
Parameter count 70B
Bibliography Amelia Glaese, Nat McAleese, Maja Trębacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Maribeth Rauh, et al.. September 28, 2022. "Improving Alignment of Dialogue Agents via Targeted Human Judgements". September 28, 2022DOI.</description></item><item><title>Surveillance</title><link>https://hugocisneros.com/notes/surveillance/</link><pubDate>Wed, 22 Feb 2023 13:28:00 +0100</pubDate><guid>https://hugocisneros.com/notes/surveillance/</guid><description> tags Privacy, Society CCTV Surveillance and crime prevention (Piza et al. 2019)
Bibliography Eric L. Piza, Brandon C. Welsh, David P. Farrington, Amanda L. Thomas. February 2019. "CCTV Surveillance for Crime Prevention: A 40‐year Systematic Review with Meta‐analysis". Criminology &amp; Public Policy 18 (1):135–59. DOI.</description></item><item><title>Ssh</title><link>https://hugocisneros.com/notes/ssh/</link><pubDate>Tue, 21 Feb 2023 16:52:00 +0100</pubDate><guid>https://hugocisneros.com/notes/ssh/</guid><description>tags Cryptography SSH random art algorithm It is described in this paper by Dirk Loss, Tobias Limmer, and Alexander von Gernler.</description></item><item><title>Privacy</title><link>https://hugocisneros.com/notes/privacy/</link><pubDate>Tue, 21 Feb 2023 15:48:00 +0100</pubDate><guid>https://hugocisneros.com/notes/privacy/</guid><description> tags Society</description></item><item><title>LayerNorm</title><link>https://hugocisneros.com/notes/layernorm/</link><pubDate>Tue, 21 Feb 2023 15:44:00 +0100</pubDate><guid>https://hugocisneros.com/notes/layernorm/</guid><description>tags Neural networks paper (Ba et al. 2016) Definition Layer Normalization is a technique used in deep learning to normalize the inputs to a layer in a neural network.
In batch normalization, the mean and variance of each batch of inputs to a layer are used to normalize the inputs. In layer normalization, the mean and variance of all the features in a layer (i.e., all the inputs for a given instance) are used to normalize the inputs.</description></item><item><title>Batch normalization</title><link>https://hugocisneros.com/notes/batch_normalization/</link><pubDate>Tue, 21 Feb 2023 14:47:00 +0100</pubDate><guid>https://hugocisneros.com/notes/batch_normalization/</guid><description> tags Neural networks</description></item><item><title>ChatGPT</title><link>https://hugocisneros.com/notes/chatgpt/</link><pubDate>Mon, 13 Feb 2023 13:26:00 +0100</pubDate><guid>https://hugocisneros.com/notes/chatgpt/</guid><description>tags GPT, Transformers, NLP blog post OpenAI blog post Architecture ChatGPT takes a GPT3.5 (aka GPT3 Davinci-003) pretrained model and uses RLHF to fine-tune the model similarly to InstructGPT but with some differences in the data collection. It is also more than &amp;ldquo;just&amp;rdquo; a model since it includes extensions for Memory Store and retrieval similar to BlenderBot 3.
Parameter count 175B</description></item><item><title>Reinforcement learning with human feedback</title><link>https://hugocisneros.com/notes/reinforcement_learning_with_human_feedback/</link><pubDate>Mon, 13 Feb 2023 13:23:00 +0100</pubDate><guid>https://hugocisneros.com/notes/reinforcement_learning_with_human_feedback/</guid><description> tags Reinforcement learning, NLP</description></item><item><title>BlenderBot 3</title><link>https://hugocisneros.com/notes/blenderbot_3/</link><pubDate>Mon, 13 Feb 2023 13:18:00 +0100</pubDate><guid>https://hugocisneros.com/notes/blenderbot_3/</guid><description>tags Transformers, GPT, OPT: Open Pre-trained Transformer, NLP blog post Meta AI announcement blog post paper (Shuster et al. 2022) Architecture It is based on a pre-trained OPT model, with some optimizations to make it better as a dialog agent, such as long term memory and the ability to search the web.
It uses human feedback to fine-tune its results on some tasks.
Parameter count 175B
Bibliography Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, et al.</description></item><item><title>Lenia</title><link>https://hugocisneros.com/notes/lenia/</link><pubDate>Thu, 02 Feb 2023 18:23:00 +0100</pubDate><guid>https://hugocisneros.com/notes/lenia/</guid><description>tags Cellular automata papers (Chan 2019; Chan 2020) Lenia is a continuous cellular automaton initially developed by Bert Chan. It is sometimes referred to as a &amp;ldquo;continuous Conway&amp;rsquo;s Game of Life&amp;rdquo;.
Definition Lenia is defined by a PDE that describes the evolution of the scalar field \(\mathbf{A}\) given a convolution kernel \(\mathbf{K}\) and a growth mapping \(G\):
\begin{equation} \mathbf{A}^{t+\Delta t} = \left[ \mathbf{A}^{t} + \Delta t \;G \big(\mathbf{K}*\mathbf{A}^t\big) \right]_0^1 \end{equation}</description></item><item><title>Neural network training</title><link>https://hugocisneros.com/notes/neural_network_training/</link><pubDate>Thu, 02 Feb 2023 18:19:00 +0100</pubDate><guid>https://hugocisneros.com/notes/neural_network_training/</guid><description>tags Neural networks, Machine learning, Optimization A common algorithm for neural network training is backpropagation.
Neural network training as development in program space A neural network as a whole can be seen as a dynamical system. Its state is the collection of its parameters, and its evolution function is the optimization step taken when training the network.
A neural network has parameters \(\theta_t\) at time \(t\) which can be seen as its state.</description></item><item><title>Neural architecture search</title><link>https://hugocisneros.com/notes/neural_architecture_search/</link><pubDate>Thu, 02 Feb 2023 18:17:00 +0100</pubDate><guid>https://hugocisneros.com/notes/neural_architecture_search/</guid><description>tags Search, Neural networks Neural architecture search (NAS) is a method for finding neural networks architectures. It is usually based on three main components:
Search space Type of network that can be built. Search strategy The approach for exploring the space. Performance estimation strategy The way the performance of a constructed neural network is evaluated (without actually building it or training/running it). Reinforcement learning-based NAS The original idea was called Neural architecture search and is based on the use of a RNN as a controller and generator of architectures.</description></item><item><title>Backpropagation</title><link>https://hugocisneros.com/notes/backpropagation/</link><pubDate>Thu, 02 Feb 2023 18:16:00 +0100</pubDate><guid>https://hugocisneros.com/notes/backpropagation/</guid><description> tags Algorithm, Neural networks</description></item><item><title>Notes on: Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks by Voelker, A., Kajić, I., &amp; Eliasmith, C. (2019)</title><link>https://hugocisneros.com/notes/voelkerlegendrememoryunits2019/</link><pubDate>Thu, 02 Feb 2023 16:38:00 +0100</pubDate><guid>https://hugocisneros.com/notes/voelkerlegendrememoryunits2019/</guid><description>tags Recurrent neural networks source (Voelker et al. 2019) Summary This paper introduces the LMU recurrent cell. This cell is based on a similar-ish idea to LSTM to maintain a memory hidden state. The main idea of the paper is to make this memory satisfy a set of first order ordinary differential equations.
\begin{equation} \theta \dot{m}(t) = Am(t) + Bu(t) \end{equation}
This system has a solution which represents sliding windows of \(u\) via Legendre polynomials.</description></item><item><title>Energy</title><link>https://hugocisneros.com/notes/energy/</link><pubDate>Fri, 27 Jan 2023 11:03:00 +0100</pubDate><guid>https://hugocisneros.com/notes/energy/</guid><description> tags Climate Renewable energy Non-renewable energy</description></item><item><title>Carbon emissions</title><link>https://hugocisneros.com/notes/carbon_emissions/</link><pubDate>Tue, 24 Jan 2023 16:28:00 +0100</pubDate><guid>https://hugocisneros.com/notes/carbon_emissions/</guid><description>tags Climate Scopes of carbon emissions Usually, greenhouse gas emissions are divided into multiple categories that correspond to different scopes of involvement in the emissions. An entity is responsible for all these emissions, but it may have different levels of control on the various scopes.
Scope 1: Direct emissions This is the CO2 or other greenhouse gas (GHG) the organization emits directly. Examples: Direct fossil fuel burning, methane leak, etc.</description></item><item><title>Carbon capture</title><link>https://hugocisneros.com/notes/carbon_capture/</link><pubDate>Tue, 24 Jan 2023 16:20:00 +0100</pubDate><guid>https://hugocisneros.com/notes/carbon_capture/</guid><description>tags Climate, Carbon emissions Carbon capture consists in using devices to re-capture and store some the carbon emissions emitted in the past or that will be emitted in the near future.
Carbon capture will be too expensive for too little results ‘Carbon Capture’ Is No Fix. Big Oil’s Known for Decades &amp;mdash; The Tyee, 7 July 2022</description></item><item><title>Carbon offsetting</title><link>https://hugocisneros.com/notes/carbon_offsetting/</link><pubDate>Tue, 24 Jan 2023 16:18:00 +0100</pubDate><guid>https://hugocisneros.com/notes/carbon_offsetting/</guid><description> tags Climate, Carbon emissions The Carbon Con, an article from Source Material
Bibliography</description></item><item><title>Carbon tax</title><link>https://hugocisneros.com/notes/carbon_tax/</link><pubDate>Tue, 24 Jan 2023 16:18:00 +0100</pubDate><guid>https://hugocisneros.com/notes/carbon_tax/</guid><description>tags Climate, Economic liberalism, Taxation The idea behind the carbon tax is to tax carbon emissions from private entities, public institutions or individual so as to create an economic incentive for emitting less carbon dioxyde.
Carbon tax and redistribution Perception and public opinion Slides by Mathild Mus (in french)</description></item><item><title>Climate</title><link>https://hugocisneros.com/notes/climate/</link><pubDate>Mon, 16 Jan 2023 21:10:00 +0100</pubDate><guid>https://hugocisneros.com/notes/climate/</guid><description>tags Complex Systems International environmental agreements From (Pouw et al. 2022):
First, at the international level, universal coalitions are more cost-efficient and effective than fragmented regimes, but more difficult to negotiate and less stable. Second, in developing countries, there is need for substantial external funding to cover the short-run costs of environmental compliance. Third, market-based solutions have been increasingly applied in international agreements but with mixed results.
Climate policies Acceptability of climate policies From (Dechezleprêtre et al.</description></item><item><title>Gig economy</title><link>https://hugocisneros.com/notes/gig_economy/</link><pubDate>Wed, 11 Jan 2023 15:50:00 +0100</pubDate><guid>https://hugocisneros.com/notes/gig_economy/</guid><description> tags Economics, Economic liberalism Flexible work and exploitation (Chung 2022)
Gig economy and unions (Gray 2022)
Bibliography Heejung Chung. 2022. The Flexibility Paradox: Why Flexible Working Leads to (Self-)Exploitation. Polity Press. Paul Christopher Gray. 2022. ""The Same Tools Work Everywhere": Organizing Gig Workers with Foodsters United". Labour / Le Travail 90 (1). The Canadian Committee on Labour History:41–84. https://muse.jhu.edu/pub/151/article/870057.</description></item><item><title>ALife 2020</title><link>https://hugocisneros.com/notes/alife_2020/</link><pubDate>Wed, 11 Jan 2023 09:48:00 +0100</pubDate><guid>https://hugocisneros.com/notes/alife_2020/</guid><description>tags ALife Conference, Artificial life Day 1 Tutorial - Functional programming for artificial life Tutorial - Visualization Principles and Techniques for Research in ALife Mike Levin - Keynote Lecture Day 2 Sara Walker, keynote Lecture About what life means and how it can be defined from the point of view of physics/information theory, etc.
Melanie Mitchell, keynote Lecture This talk was very similar to another one I watched from Santa Fe Institute which promotes her book: Artificial Intelligence: A Guide for Thinking Humans.</description></item><item><title>Transformers</title><link>https://hugocisneros.com/notes/transformers/</link><pubDate>Thu, 05 Jan 2023 14:13:00 +0100</pubDate><guid>https://hugocisneros.com/notes/transformers/</guid><description>tags Neural networks resources Transformer catalog, The illustrated transformer Transformers are a neural network architecture based on a mechanism called Attention.
They have been particularly successful for NLP applications which started around the publication of a very influential paper by Vaswani and colleagues (Vaswani et al. 2017). Transformers turned out to be very effective language models.
They also penetrated other fields of machine learning such as Computer vision or Reinforcement learning.</description></item><item><title>Radon transform</title><link>https://hugocisneros.com/notes/radon_transform/</link><pubDate>Mon, 02 Jan 2023 11:36:00 +0100</pubDate><guid>https://hugocisneros.com/notes/radon_transform/</guid><description> tags Signal processing From this tweet:
The marginals \(fX\) and \(fY\) of a joint distribution \(f(x, y)\) can be seen as the Radon transform of \(f(x,y)\) in the \(θ=0\) and \(θ=π/2\) directions.
Similarly, a joint distribution can be thought of as an optimal transport solution to an undersampled tomography result.
Bibliography</description></item><item><title>Generative art</title><link>https://hugocisneros.com/notes/generative_art/</link><pubDate>Wed, 28 Dec 2022 13:44:00 +0100</pubDate><guid>https://hugocisneros.com/notes/generative_art/</guid><description> tags Art, Algorithm</description></item><item><title>Creative coding</title><link>https://hugocisneros.com/notes/creative_coding/</link><pubDate>Wed, 28 Dec 2022 13:43:00 +0100</pubDate><guid>https://hugocisneros.com/notes/creative_coding/</guid><description> tags Coding, Art</description></item><item><title>Notes on: Efficient Neural Architecture Search via Parameter Sharing by Pham, H., Guan, M. Y., Zoph, B., Le, Q. V., &amp; Dean, J. (2018)</title><link>https://hugocisneros.com/notes/phamefficientneuralarchitecture2018/</link><pubDate>Fri, 23 Dec 2022 17:40:00 +0100</pubDate><guid>https://hugocisneros.com/notes/phamefficientneuralarchitecture2018/</guid><description>tags Neural architecture search source (Pham et al. 2018) Summary Like other papers, the controller is a RNN that generates each part of the architecture in sequence. The main contribution of this paper is to introduce parameter sharing in child models. For, this, it represents all possible architectures in a single DAG of operations and share weights between same operations. They explain how to design a RNN cell with their model, a convolutional network (and convolutional cell to build a CNN) and how to train.</description></item><item><title>Art</title><link>https://hugocisneros.com/notes/art/</link><pubDate>Thu, 22 Dec 2022 11:19:00 +0100</pubDate><guid>https://hugocisneros.com/notes/art/</guid><description/></item><item><title>Art with Cellular Automata</title><link>https://hugocisneros.com/notes/art_with_cellular_automata/</link><pubDate>Thu, 22 Dec 2022 11:19:00 +0100</pubDate><guid>https://hugocisneros.com/notes/art_with_cellular_automata/</guid><description>tags Art Cellular automata have been used a lot to create various forms of Generative art.
Here is a collection of some interesting examples:
Examples based on Neural CA Self organizing textures: (Niklasson et al. 2021) Dialogue: an art project using interacting Neural CA to generate patterns Revisiting classical CA Crosshatch CA CA Music Wolfram tones Bibliography Eyvind Niklasson, Alexander Mordvintsev, Ettore Randazzo, Michael Levin. February 11, 2021. "Self-Organising Textures"</description></item><item><title>Machine learning</title><link>https://hugocisneros.com/notes/machine_learning/</link><pubDate>Wed, 21 Dec 2022 09:16:00 +0100</pubDate><guid>https://hugocisneros.com/notes/machine_learning/</guid><description>tags Artificial Intelligence, Applied maths Machine learning is about constructing algorithms that can approximate complex functions from observations of input/output pairs. Machine learning is related to Statistics since its goal is to make predictions based on data.
Regression The goal is to approximate a target function \(f\) or signal \(S\). The output space is often continuous.
Classification The goal is to approximate a target function that assigns label to input points.</description></item><item><title>Huffman coding</title><link>https://hugocisneros.com/notes/huffman_coding/</link><pubDate>Tue, 20 Dec 2022 16:59:00 +0100</pubDate><guid>https://hugocisneros.com/notes/huffman_coding/</guid><description>tags Compression, Entropy coding Python implementation from heapq import heappush, heappop, heapify def huffman_coding(frequency_dict): # Create a heap of tuples (frequency, character, code) heap = [[frequency, [character, &amp;#34;&amp;#34;]] for character, frequency in frequency_dict.items()] heapify(heap) while len(heap) &amp;gt; 1: # Extract the two nodes with the lowest frequencies left, right = heappop(heap), heappop(heap) # Assign a &amp;#34;0&amp;#34; to the left child and a &amp;#34;1&amp;#34; to the right child for pair in left[1:]: pair[1] = &amp;#34;0&amp;#34; + pair[1] for pair in right[1:]: pair[1] = &amp;#34;1&amp;#34; + pair[1] # Merge the two nodes and add the resulting node back to the heap heappush(heap, [left[0] + right[0]] + left[1:] + right[1:]) # Extract the coding dictionary from the heap return dict(sorted(heappop(heap)[1:], key=lambda p: (len(p[-1]), p))) # Example usage frequency_dict = {&amp;#39;a&amp;#39;: 45, &amp;#39;b&amp;#39;: 13, &amp;#39;c&amp;#39;: 12, &amp;#39;d&amp;#39;: 16, &amp;#39;e&amp;#39;: 9, &amp;#39;f&amp;#39;: 5, &amp;#39;r&amp;#39;: 2, &amp;#39;q&amp;#39;: 1} print(huffman_coding(frequency_dict)) The code above outputs the following Python dictionary:</description></item><item><title>Algorithmic bias</title><link>https://hugocisneros.com/notes/algorithmic_bias/</link><pubDate>Tue, 20 Dec 2022 16:25:00 +0100</pubDate><guid>https://hugocisneros.com/notes/algorithmic_bias/</guid><description> tags Machine learning</description></item><item><title>Image classification</title><link>https://hugocisneros.com/notes/image_classification/</link><pubDate>Tue, 20 Dec 2022 16:12:00 +0100</pubDate><guid>https://hugocisneros.com/notes/image_classification/</guid><description>tags Computer vision Image classification is a machine learning associated with Computer vision. Its goal is to assign a class to a particular image. When this class corresponds to an object depicted in the image, the task is called Object recognition.
In general, image classification can also be applied to many other areas. For example, in medical imaging, one may want to design an algorithm that can classify
Convolutional neural networks have been successful at many image classification tasks, and are starting to be overtaken by transformers for some applications.</description></item><item><title>Object recognition</title><link>https://hugocisneros.com/notes/object_recognition/</link><pubDate>Tue, 20 Dec 2022 16:09:00 +0100</pubDate><guid>https://hugocisneros.com/notes/object_recognition/</guid><description> tags Computer vision</description></item><item><title>Resource curse</title><link>https://hugocisneros.com/notes/resource_curse/</link><pubDate>Tue, 20 Dec 2022 15:08:00 +0100</pubDate><guid>https://hugocisneros.com/notes/resource_curse/</guid><description>tags Economics Definition The resource curse is a phenomenon in which countries with an abundance of natural resources, such as oil, minerals, and other raw materials, often end up with slower economic growth and less democratic governments than countries without such resources. This may be because the wealth generated by the exploitation of these resources is not distributed evenly and can lead to corruption, conflict, and social unrest. The resource curse is also referred to as the &amp;ldquo;paradox of plenty&amp;rdquo;.</description></item><item><title>Dutch disease</title><link>https://hugocisneros.com/notes/dutch_disease/</link><pubDate>Tue, 20 Dec 2022 14:54:00 +0100</pubDate><guid>https://hugocisneros.com/notes/dutch_disease/</guid><description>tags Economics resources Wikipedia Definition From (Corden 1984):
The term Dutch Disease refers to the adverse effects on Dutch manufacturing of the natural gas discoveries of the nineteen sixties, essentially through the subsequent appreciation of the Dutch real exchange rate [footnote]. [footnote] The first printed reference to the term I have found is in the article &amp;ldquo;The Dutch Disease&amp;rdquo; in The Economist November 26th 1977, pp. 82-3.
After the discovery of this large natural gas field in 1959, the Dutch currency became stronger compared to other nations thanks to increased exports.</description></item><item><title>Reinforcement learning</title><link>https://hugocisneros.com/notes/reinforcement_learning/</link><pubDate>Wed, 14 Dec 2022 20:41:00 +0100</pubDate><guid>https://hugocisneros.com/notes/reinforcement_learning/</guid><description>tags Machine learning In reinforcement learning, agents take actions within an environment. Usually, both the agent and environment states change in reaction to this action. A reward is given to the agent to tell it if the action was positive or negative.
The goal of a learning agent is to act so as to maximize that reward.
An agent can be anything from a fixed set of if-else statements to a deep neural network.</description></item><item><title>Batteries</title><link>https://hugocisneros.com/notes/batteries/</link><pubDate>Thu, 24 Nov 2022 21:07:00 +0100</pubDate><guid>https://hugocisneros.com/notes/batteries/</guid><description>tags Climate, Energy Gigafactories The word gigafactory was coined by Tesla, who started building its first massive factory in 2014 in Nevada. The word corresponds to the scale of the output of these factories which is on the order of GWh of total battery capacity constructed per year. Now the word is used for all battery factories.
According to a study by Ultimate Media and ABB, the global battery production was 450 GWh in 2020.</description></item><item><title>Fourier transform</title><link>https://hugocisneros.com/notes/fourier_transform/</link><pubDate>Wed, 16 Nov 2022 09:23:00 +0100</pubDate><guid>https://hugocisneros.com/notes/fourier_transform/</guid><description>tags Mathematics, Signal processing Gibbs phenomenon The Gibbs phenomenon appears when a discontinuous function is approximated with Fourier coefficients. The result is an overshoot at the discontinuities that does not decrease as more terms are added to the approximation.</description></item><item><title>L-Systems</title><link>https://hugocisneros.com/notes/l_systems/</link><pubDate>Thu, 20 Oct 2022 15:11:00 +0200</pubDate><guid>https://hugocisneros.com/notes/l_systems/</guid><description>tags Complex Systems L-systems are string re-writing systems. They operate on an alphabet of symbols, rewriting symbols or patterns of symbols into new patterns according to a set of rules.
Examples Fractal tree A simple binary tree can be constructed with a L-system, using the rule (1 → 11), (0 → 1[0]0) and starting from a single 0.
This could be then drawn by a turtle drawer, where :</description></item><item><title>Einops</title><link>https://hugocisneros.com/notes/einops/</link><pubDate>Wed, 19 Oct 2022 15:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/einops/</guid><description> tags Coding Einops is a array manipulation paradigm that uses string description of the array operations to make complex manipulations (such as summing, broadcasting and reshaping on specific axes only) easier.
Examples Max pooling import einops as ein patches = ein.rearrange(img, &amp;#39;(h i) (w j) c -&amp;gt; h w i j c&amp;#39;, i=20, j=20) max_pool = ein.reduce(patches, &amp;#39;h w i j c -&amp;gt; h w c&amp;#39;, &amp;#39;max&amp;#39;)</description></item><item><title>Diffusion models</title><link>https://hugocisneros.com/notes/diffusion_models/</link><pubDate>Wed, 19 Oct 2022 15:07:00 +0200</pubDate><guid>https://hugocisneros.com/notes/diffusion_models/</guid><description>tags Generative modelling papers (Sohl-Dickstein et al. 2015), (Ho et al. 2020) Principle of diffusion Forward diffusion An image of size \(N\) by \(N\) \(x_0\), which is a vector in \(\mathbb{R}^{N \times N \times c}\) is diffused at each timestep \(t\) to become \(x_t\). The forward diffusion step is defined as follows: \[ q(\boldsymbol{x}_t | \boldsymbol{x}_{t-1}) = \mathcal{N}(\boldsymbol{x}_t; \sqrt{1 - \beta_t} \boldsymbol{x}_{ t - 1 }, \beta_t I) \] The probability of a sequence of images \(x_1, \ldots, x_T\) is then \[ q(\boldsymbol{x}_1, \ldots, \boldsymbol{x}_T | \boldsymbol{x}_0) = \prod_{t=1}^T q(\boldsymbol{x}_t|\boldsymbol{x}_{t -1}) \]</description></item><item><title>Frank-Wolfe algorithm</title><link>https://hugocisneros.com/notes/frank_wolfe_algorithm/</link><pubDate>Thu, 22 Sep 2022 12:44:00 +0200</pubDate><guid>https://hugocisneros.com/notes/frank_wolfe_algorithm/</guid><description>tags Optimization, Algorithm resources Fabian Pedregosa&amp;rsquo;s series on FW Definition It was originally published in (Frank, Wolfe 1956) and (Jaggi 2013) gives a more recent overview.
For a function \(f\) differentiable with $L$-Lipschitz gradients, and its domain \(\mathcal{C}\) is a convex and compact set, we want to solve the optimization problem:
\[ \min_{\boldsymbol{x} \in \mathcal{C}} f(\boldsymbol{x}) \]
The algorithm starts with an initial guess \(\boldsymbol{x}_0\) and constructs a sequence of values \(\boldsymbol{x}_1, \boldsymbol{x}_2, \cdots\) which converges to the solution.</description></item><item><title>Notes on: Git Re-Basin: Merging Models modulo Permutation Symmetries by Ainsworth, S. K., Hayase, J., &amp; Srinivasa, S. (2022)</title><link>https://hugocisneros.com/notes/ainsworthgitrebasinmerging2022/</link><pubDate>Mon, 19 Sep 2022 11:09:00 +0200</pubDate><guid>https://hugocisneros.com/notes/ainsworthgitrebasinmerging2022/</guid><description> source (Ainsworth et al. 2022) tags Neural networks TODO Summary This paper introduces various methods for matching and interpolating the weights of multiple neural networks of the same architecture trained from different starting points or data. These neural networks have different weight values after the training.
TODO Comments Bibliography Samuel K. Ainsworth, Jonathan Hayase, Siddhartha Srinivasa. September 11, 2022. "Git Re-Basin: Merging Models Modulo Permutation Symmetries". September 11, 2022DOI.</description></item><item><title>Complex Systems</title><link>https://hugocisneros.com/notes/complex_systems/</link><pubDate>Wed, 07 Sep 2022 10:22:00 +0200</pubDate><guid>https://hugocisneros.com/notes/complex_systems/</guid><description>tags Physics Definition By a complex system I mean one made up of a large number of parts that interact in a nonsimple way.
&amp;mdash; Herbert Simon, 1962
Bottomless wonders spring from simple rules, which are repeated without end.
&amp;mdash; Mandelbrot, ~1980
When we talk about complex systems in time, we often used the term complex dynamical systems.
Examples of complex systems The economy (Anderson 1996) Boolean networks Cellular automata Neural networks Many physical systems Understanding complex systems There is an interesting series of articles that are about how different scientific disciplines approach the same problem of understanding an incredibly complex system that we initially don&amp;rsquo;t know anything about:</description></item><item><title>Neoliberalism</title><link>https://hugocisneros.com/notes/neoliberalism/</link><pubDate>Tue, 06 Sep 2022 08:35:00 +0200</pubDate><guid>https://hugocisneros.com/notes/neoliberalism/</guid><description>tags Economics, Economic liberalism Definition In (Hay 2004), the author gives the following definition of neoliberalism;
Economic neoliberalism, I suggest, can be defined in terms of the following traits:
A confidence in the market as an efficient mechanism for the allocation of scarce resources. A belief in the desirability of a global regime of free trade and free capital mobility. A belief in the desirability, all things being equal, of a limited and non-interventionist role for the state and of the state as a facilitator and custodian rather than a substitute for market mechanisms.</description></item><item><title>Gradient descent for wide two-layer neural networks – I : Global convergence</title><link>https://hugocisneros.com/notes/gradient_descent_for_wide_two_layer_neural_networks_i_global_convergence/</link><pubDate>Thu, 01 Sep 2022 08:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/gradient_descent_for_wide_two_layer_neural_networks_i_global_convergence/</guid><description>tags Neural networks, Optimization authors Francis Bach, Lénaïc Chizat source Francis Bach&amp;rsquo;s blog In the rest, we use the mathematical definition of a neural network from Neural networks.
Two layer neural network Even simple neural network models are very difficult to analyze. This is primarily due to two difficulties:
Non-linearity: the problem is typically non-convex, which in general is a bad thing in optimization. Overparametrization: there are often a lot of parameters, sometimes many more parameters than observations.</description></item><item><title>Linear programming</title><link>https://hugocisneros.com/notes/linear_programming/</link><pubDate>Tue, 30 Aug 2022 21:26:00 +0200</pubDate><guid>https://hugocisneros.com/notes/linear_programming/</guid><description>tags Optimization Linear programs are problems that can be expressed as
\begin{align} &amp;amp; \text{Find a vector} &amp;amp;&amp;amp; \mathbf{x} \\ &amp;amp; \text{that maximizes} &amp;amp;&amp;amp; \mathbf{c}^T \mathbf{x}\\ &amp;amp; \text{subject to} &amp;amp;&amp;amp; A \mathbf{x} \leq \mathbf{b} \\ &amp;amp; \text{and} &amp;amp;&amp;amp; \mathbf{x} \ge \mathbf{0}. \end{align}</description></item><item><title>Hopfield Networks</title><link>https://hugocisneros.com/notes/hopfield_networks/</link><pubDate>Tue, 30 Aug 2022 21:25:00 +0200</pubDate><guid>https://hugocisneros.com/notes/hopfield_networks/</guid><description>tags Neural networks Hopfield networks are a kind of recurrent neural network with binary threshold nodes.
Definition Nodes have indexes \(i \in \{1, \cdots, n\}\) and are in state \(s_i \in \{-1, 1\}\). Nodes have connections between them, characterized by a weight \(w_{ij}\). Each node also has an associated threshold \(\theta_i\) such that
\[ s_i \leftarrow \begin{cases} +1 &amp;amp; \text{if}\ \sum_j w_{ij} s_j \geq \theta_i, \newline -1 &amp;amp; \text{otherwise}. \end{cases} \]</description></item><item><title>Talk: Alife 2020 keynote Sara Walker - The Natural History of Information</title><link>https://hugocisneros.com/notes/talk_alife_2020_keynote_sara_walker_the_natural_history_of_information/</link><pubDate>Tue, 30 Aug 2022 21:18:00 +0200</pubDate><guid>https://hugocisneros.com/notes/talk_alife_2020_keynote_sara_walker_the_natural_history_of_information/</guid><description>tags Life The problem of defining life Definitions of life have always been elusive.
Life does not exist.
&amp;mdash; Andrew Ellington (American Chemical Society 2012).
as one focuses experimentally on any of the &amp;lsquo;defining&amp;rsquo; properties of &amp;rsquo;life&amp;rsquo;, the sharp boundary seems to blur, splitting into finer and finer sub-divisions
&amp;mdash; Jack Szostak (J. Biomolecular Struc. Dyn. 29.4 (2012) : 599-600.)
When looking at matter down to the chemical level, it&amp;rsquo;s hard to tell what is fundamentally different between living and non-living matter.</description></item><item><title>Cellular automata</title><link>https://hugocisneros.com/notes/cellular_automata/</link><pubDate>Tue, 30 Aug 2022 21:15:00 +0200</pubDate><guid>https://hugocisneros.com/notes/cellular_automata/</guid><description>tags Emergence, Chaos, Artificial Intelligence resources Wikipedia, (Von Neumann, Burks 1966; Wolfram 2002) Definition A cellular automaton is a computational model defined with respect to a regular grid of individual elements (called cells). Each of those cells can be in one of a finite number of states &amp;mdash; alive or dead, \(\{1, 2, 3\}\), etc.
A cellular automaton&amp;rsquo;s evolution is simulated in discrete timesteps. At each new timestep, cells are updated according to a local evolution rule.</description></item><item><title>Finance</title><link>https://hugocisneros.com/notes/finance/</link><pubDate>Tue, 30 Aug 2022 19:01:00 +0200</pubDate><guid>https://hugocisneros.com/notes/finance/</guid><description> tags Economics</description></item><item><title>Continual learning</title><link>https://hugocisneros.com/notes/continual_learning/</link><pubDate>Tue, 30 Aug 2022 16:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/continual_learning/</guid><description>tags Machine learning Continual learning is a type of supervised learning where there is no &amp;ldquo;testing phase&amp;rdquo; associated to a decision process. Instead, training samples keep being processed by the algorithm which has to simultaneously make predictions and keep learning.
This is challenging for a fixed neural network architecture since it has a fixed capacity and is bound to either forget things or be unable to learn anything new.</description></item><item><title>Futures contracts</title><link>https://hugocisneros.com/notes/futures_contracts/</link><pubDate>Tue, 30 Aug 2022 16:29:00 +0200</pubDate><guid>https://hugocisneros.com/notes/futures_contracts/</guid><description> tags Finance, Economics</description></item><item><title>Unker non-linear writing system</title><link>https://hugocisneros.com/notes/unker_non_linear_writing_system/</link><pubDate>Fri, 19 Aug 2022 14:44:00 +0200</pubDate><guid>https://hugocisneros.com/notes/unker_non_linear_writing_system/</guid><description>tags Language resources https://s.ai/nlws/ A fascinating writing system based on glyphs connected to each other to create meaning. The system is quite advanced and reading its grammar is like discovering some new alien language. It was created by Alex Fink and Sai in 2010.
This is the kind of complexity that would be incredible to discover in open-ended evolving language systems. A system starting from elementary components and no particular assumption about what language should be, could come up with such exotic models (probably even more exotic in the case of a truly open-ended system).</description></item><item><title>Optimal transport</title><link>https://hugocisneros.com/notes/optimal_transport/</link><pubDate>Thu, 18 Aug 2022 15:38:00 +0200</pubDate><guid>https://hugocisneros.com/notes/optimal_transport/</guid><description>tags Applied maths Ramified optimal transport Introduction to ramified optimal transportation</description></item><item><title>Graham scan</title><link>https://hugocisneros.com/notes/graham_scan/</link><pubDate>Thu, 04 Aug 2022 14:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/graham_scan/</guid><description>tags Algorithm Graham scan is an algorithm to find the convex hull of a set of points in 2D. It runs with a time complexity of \(\mathcal{O}(n\log n)\).
The algorithm is relatively simple. It starts by selecting the point with lowest $y$-coordinate. At each step of the algorithm, remaining points are sorted by increasing order of the angle they and the last added point make. Then, if this new point is</description></item><item><title>Kullback-leibler divergence</title><link>https://hugocisneros.com/notes/kullback_leibler_divergence/</link><pubDate>Thu, 04 Aug 2022 14:09:00 +0200</pubDate><guid>https://hugocisneros.com/notes/kullback_leibler_divergence/</guid><description>tags Applied maths Definition The KL divergence is not symmetric. For \(P, Q\) defined on the same probability space \(\mathcal{X}\), KL of \(Q\) from \(P\) is \[ KL(P, Q) = \sum_{x \in \mathcal{X}} P(x) \log\left( \frac{P(x)}{Q(x)} \right) \]
It has two main interpretations:
It is the information gain from using the right probability distribution \(P\) instead of \(Q\) or the amount of information lost by approximating \(P\) with \(Q\). The average difference in code length for a sequence following \(P\) and using a code optimized for \(Q\) to encode it.</description></item><item><title>Notes on: Fast and stable MAP-Elites in noisy domains using deep grids by Flageat, M., &amp; Cully, A. (2020)</title><link>https://hugocisneros.com/notes/flageatfaststablemapelites2020/</link><pubDate>Thu, 04 Aug 2022 14:09:00 +0200</pubDate><guid>https://hugocisneros.com/notes/flageatfaststablemapelites2020/</guid><description>source (Flageat, Cully 2020) tags ALife 2020, MAP-Elites, Quality diversity Summary MAP-Elites can be problematic in face of uncertainty because:
individuals can be unexpectedly lucky the behavior space can be hard to estimate and result in misplacing individuals. Some mitigation techniques have been explored, e.g in (Justesen et al. 2019) and this paper is about introducing another way of dealing with noisy domains without using sampling.
Here the main idea is to replace the MAP-elites grid by a &amp;ldquo;deep grid&amp;rdquo; with another dimension.</description></item><item><title>Notes on: Resilient Life: An Exploration of Perturbed Autopoietic Patterns in Conway's Game of Life by Cika, A., Cohen, E., Kruszewski, G., Seet, L., Steinmann, P., &amp; Yin, W. (2020)</title><link>https://hugocisneros.com/notes/cikaresilientlifeexploration2020/</link><pubDate>Thu, 04 Aug 2022 14:08:00 +0200</pubDate><guid>https://hugocisneros.com/notes/cikaresilientlifeexploration2020/</guid><description>tags ALife 2020, Cellular automata, Autopoiesis source (Cika et al. 2020) Summary This paper is about the possible resistance of GoL patterns to perturbations and the structures that could enable this to happen. They also want to know if resilience is a universal property of computational systems.
They test two types of resilience:
Additive (add one or two live cells to the pattern) Negative (&amp;ldquo;kill&amp;rdquo; one or two live cells from the pattern) They use 3 metrics for resilience:</description></item><item><title>Notes on: Safe Reinforcement Learning through Meta-learned Instincts by Grbic, D., &amp; Risi, S. (2020)</title><link>https://hugocisneros.com/notes/grbicsafereinforcementlearning2020/</link><pubDate>Thu, 04 Aug 2022 14:07:00 +0200</pubDate><guid>https://hugocisneros.com/notes/grbicsafereinforcementlearning2020/</guid><description>source (Grbic, Risi 2020) tags Meta-learning, Reinforcement learning, ALife 2020 Summary In RL an important goal is to find agents that can quickly adapt to changing environments while avoiding unsafe states. However, in deep RL, there is often noise added to explore the action space: this can lead to unsafe part of the state-action space.
Figure 1: Slide from the Alife talk
The meta-learning setting of MAML is adapted to RL, with a policy network learning the policy in a standard way and a &amp;ldquo;instinctual network&amp;rdquo; which is fixed for a group of tasks and modulates the regular policy with its own action vector.</description></item><item><title>Talk: Alife 2020 keynote Michael Levin - Robot Cancer</title><link>https://hugocisneros.com/notes/talk_alife_2020_keynote_michael_levin_robot_cancer/</link><pubDate>Thu, 04 Aug 2022 14:06:00 +0200</pubDate><guid>https://hugocisneros.com/notes/talk_alife_2020_keynote_michael_levin_robot_cancer/</guid><description>tags Emergence, Biological life, ALife 2020 How do organisms store information and are able to pass it down through very profound structural changes (from a caterpillar to a butterfly, when cutting a flatworm in multiple pieces, etc.)?
Embryogenesis is a reliable self-assembly. It relies on stem cell differentiation but that&amp;rsquo;s not enough: some tumor (Teratoma) are differentiated but don&amp;rsquo;t have the right 3D spatial organisation.
Where is the large-scale pattern specified?</description></item><item><title>Hilbert curve indexing</title><link>https://hugocisneros.com/notes/hilbert_curve_indexing/</link><pubDate>Thu, 04 Aug 2022 14:05:00 +0200</pubDate><guid>https://hugocisneros.com/notes/hilbert_curve_indexing/</guid><description>tags Coding Hilbert curves can be used for an interesting trick involving 2D arrays indexing. Because of the way the Hilbert curve traverses the 2D space, indexing a 2D array this way can be a more cache-friendly solution when frequently accessing neighbors of an array element.
Figure 1: Hilbert curve with different number of iterations
C implementation from Wikipedia to convert (x,y) coordinates to linear ones and vice versa:</description></item><item><title>Notes on: Scaling down Deep Learning by Greydanus, S. (2020)</title><link>https://hugocisneros.com/notes/greydanusscalingdeeplearning2020/</link><pubDate>Thu, 04 Aug 2022 14:03:00 +0200</pubDate><guid>https://hugocisneros.com/notes/greydanusscalingdeeplearning2020/</guid><description>tags Neural networks source (Greydanus 2020) DONE Summary This paper introduces a minimalist 1D version of the MNIST dataset for studying some basic properties of neural networks. The authors simplify the MNIST dataset by assigning a 1D glyph to each digit. These glyphs are padded, translated, sheared and blurred to build a dataset of multiple different objects.
The figure from the paper shown below illustrates this dataset&amp;rsquo;s construction:
Figure 1: 1D simple MNIST</description></item><item><title>Machine translation</title><link>https://hugocisneros.com/notes/machine_translation/</link><pubDate>Thu, 04 Aug 2022 13:31:00 +0200</pubDate><guid>https://hugocisneros.com/notes/machine_translation/</guid><description> tags NLP</description></item><item><title>Benford's law</title><link>https://hugocisneros.com/notes/benford_s_law/</link><pubDate>Thu, 04 Aug 2022 13:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/benford_s_law/</guid><description>tags Statistics A set of numbers satisfies Benford&amp;rsquo;s law if the leading digits of these numbers occur with a probability logarithmically decreasing with the digit. More precisely, for \(d \in \{1, \ldots, 9\}\),
\[ P(d) = \log_{10} \left(1 + \frac{1}{d} \right) \]
Many sequence that span multiple orders of magnitude satisfy Benford&amp;rsquo;s law, including the Fibonacci sequence (Washington 1981).
The law has been proposed for use in fraud detection, because artificial uniformly distributed fake numbers would not follow the law.</description></item><item><title>t-SNE</title><link>https://hugocisneros.com/notes/t_sne/</link><pubDate>Tue, 02 Aug 2022 14:03:00 +0200</pubDate><guid>https://hugocisneros.com/notes/t_sne/</guid><description>tags Machine learning paper (Van der Maaten, Hinton 2008) Example: Embedding the vertices of a high dimensional cube in 2D We first create a cube with dimension 12:
import imageio import numpy as np import matplotlib.pyplot as plt from sklearn.manifold import TSNE # This creates a numpy array with the vertices of a N-dimensional cube. # From https://stackoverflow.com/a/52229558 cube = lambda N: 2*((np.arange(2**N)[:,None] &amp;amp; (1 &amp;lt;&amp;lt; np.arange(N))) &amp;gt; 0) - 1 # Create a 12-hypercube&amp;#39;s vertices cube_arr = cube(12) steps = [] # We run t-SNE with 250 to 5000 iterations.</description></item><item><title>Adversarial examples</title><link>https://hugocisneros.com/notes/adversarial_examples/</link><pubDate>Mon, 01 Aug 2022 17:29:00 +0200</pubDate><guid>https://hugocisneros.com/notes/adversarial_examples/</guid><description>tags Machine learning, Neural networks Adversarial examples in Reinforcement learning Adversarial examples in Computer vision Adversarial examples in NLP A Python library for creating and using text attacks: TextAttack.
Figure 1: This diagram illustrates the standard flow of an adversarial attack on text data.
The three components of a text adversarial example:
Goal function: This is a function that takes an original sentence, an attacked sentence, computes a score and the result of the attack (successful or not).</description></item><item><title>Snake in the tunnel</title><link>https://hugocisneros.com/notes/snake_in_the_tunnel/</link><pubDate>Mon, 01 Aug 2022 16:21:00 +0200</pubDate><guid>https://hugocisneros.com/notes/snake_in_the_tunnel/</guid><description> tags Economics, Europe resources Wikipedia</description></item><item><title>Europe</title><link>https://hugocisneros.com/notes/europe/</link><pubDate>Mon, 01 Aug 2022 15:23:00 +0200</pubDate><guid>https://hugocisneros.com/notes/europe/</guid><description/></item><item><title>GPT</title><link>https://hugocisneros.com/notes/gpt/</link><pubDate>Wed, 27 Jul 2022 12:19:00 +0200</pubDate><guid>https://hugocisneros.com/notes/gpt/</guid><description> tags Transformers, NLP paper (Radford et al. 2018) Succesors The GPT architecture was improved upon and extended into GPT-2 and GPT-3. The original &amp;ldquo;GPT-1&amp;rdquo; was quickly abandoned in favor of its successor, but GPT is still used to refer to this family of models.
Parameter count 117M
Bibliography Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever. 2018. "Improving Language Understanding by Generative Pre-Training". 2018.</description></item><item><title>Gato</title><link>https://hugocisneros.com/notes/gato/</link><pubDate>Wed, 27 Jul 2022 12:12:00 +0200</pubDate><guid>https://hugocisneros.com/notes/gato/</guid><description> tags Transformers, Reinforcement learning paper (Reed et al. 2022) Architecture A standard decoder-only transformer is preceded by an embedding layer that embeds text and images with positional encoding and spatial information if available.
Parameter count 1.2B
Bibliography Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, et al.. May 12, 2022. "A Generalist Agent". https://arxiv.org/abs/2205.06175v2.</description></item><item><title>XLNet</title><link>https://hugocisneros.com/notes/xlnet/</link><pubDate>Wed, 27 Jul 2022 12:06:00 +0200</pubDate><guid>https://hugocisneros.com/notes/xlnet/</guid><description> tags Transformers, Transformer-XL, NLP paper (Yang et al. 2020) Architecture The model adapts Transformer-XL to be a permutation based language model.
Parameter count Base = 117M Large = 360M Bibliography Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. January 2, 2020. "XLNet: Generalized Autoregressive Pretraining for Language Understanding". January 2, 2020DOI.</description></item><item><title>XLM-RoBERTa</title><link>https://hugocisneros.com/notes/xlm_roberta/</link><pubDate>Wed, 27 Jul 2022 12:04:00 +0200</pubDate><guid>https://hugocisneros.com/notes/xlm_roberta/</guid><description>tags Transformers, RoBERTa, NLP paper (Conneau et al. 2020) Architecture The model is an extension of RoBERTa that introduces small parameter tuning insights in the context of multilingual applications.
Parameter count Base = 270M Large = 550M Bibliography Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov. April 7, 2020. "Unsupervised Cross-lingual Representation Learning at Scale". April 7, 2020DOI.</description></item><item><title>Wu Dao 2.0</title><link>https://hugocisneros.com/notes/wu_dao_2_0/</link><pubDate>Wed, 27 Jul 2022 11:52:00 +0200</pubDate><guid>https://hugocisneros.com/notes/wu_dao_2_0/</guid><description>tags Transformers, NLP website Wikipedia page for Wu Dao Architecture It is similar to GPT, being a decoder architecture but it applies a different pre-training task.
Parameter count 1.75T</description></item><item><title>Turing-NLG</title><link>https://hugocisneros.com/notes/turing_nlg/</link><pubDate>Wed, 27 Jul 2022 11:48:00 +0200</pubDate><guid>https://hugocisneros.com/notes/turing_nlg/</guid><description>tags Transformers, GPT, NLP website Microsoft Project Turing Architecture The architecture is similar to GPT-2 and GPT-3 with some parameter optimization and software/hardware platform to improve training.
Parameter count 17B originally, now up to 530B.</description></item><item><title>Vision transformer</title><link>https://hugocisneros.com/notes/vision_transformer/</link><pubDate>Wed, 27 Jul 2022 11:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/vision_transformer/</guid><description> tags Transformers, Computer vision, BERT paper (Dosovitskiy et al. 2021) Architecture It is an extension of the BERT architecture that can be trained on patches of images.
Parameter count 86M to 632M
Bibliography Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, et al.. June 3, 2021. "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale". June 3, 2021DOI.</description></item><item><title>Transformer-XL</title><link>https://hugocisneros.com/notes/transformer_xl/</link><pubDate>Wed, 27 Jul 2022 11:42:00 +0200</pubDate><guid>https://hugocisneros.com/notes/transformer_xl/</guid><description> tags Transformers, NLP paper (Dai et al. 2019) Architecture This model uses relative positional embedding to enable using attention over longer contexts than the vanilla Transformer.
Parameter count 151M
Bibliography Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov. June 2, 2019. "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context". June 2, 2019DOI.</description></item><item><title>Trajectory transformer</title><link>https://hugocisneros.com/notes/trajectory_transformer/</link><pubDate>Wed, 27 Jul 2022 11:41:00 +0200</pubDate><guid>https://hugocisneros.com/notes/trajectory_transformer/</guid><description> tags Transformers, Reinforcement learning, GPT paper (Janner et al. 2021) Architecture It is a similar model to Decision transformer, with some added techniques to encode a trajectory.
Bibliography Michael Janner, Qiyang Li, Sergey Levine. November 28, 2021. "Offline Reinforcement Learning as One Big Sequence Modeling Problem". November 28, 2021DOI.</description></item><item><title>T5</title><link>https://hugocisneros.com/notes/t5/</link><pubDate>Wed, 27 Jul 2022 11:28:00 +0200</pubDate><guid>https://hugocisneros.com/notes/t5/</guid><description> tags Transformers, NLP paper (Raffel et al. 2020) Architecture It is the same as the original transformer with some relative positional embedding added (similar to Transformer-XL).
Parameter count 11B
Bibliography Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. July 28, 2020. "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". July 28, 2020DOI.</description></item><item><title>Swin Transformer</title><link>https://hugocisneros.com/notes/swin_transformer/</link><pubDate>Wed, 27 Jul 2022 11:04:00 +0200</pubDate><guid>https://hugocisneros.com/notes/swin_transformer/</guid><description> tags Transformers, ViT, Computer vision paper (Liu et al. 2021) Architecture This model extends ViT by replace the multi-head self-attention with a &amp;ldquo;shifted windows&amp;rdquo; module allowing ViT to work with higher resolution images.
Parameter count 29M - 197M
Bibliography Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo. August 17, 2021. "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows". August 17, 2021DOI.</description></item><item><title>SeeKer</title><link>https://hugocisneros.com/notes/seeker/</link><pubDate>Wed, 27 Jul 2022 11:01:00 +0200</pubDate><guid>https://hugocisneros.com/notes/seeker/</guid><description>tags Transformers, GPT paper (Shuster et al. 2022) Architecture This is an extension that can be applied to any Transformer model by introducing “search”, “knowledge”, and “response” modules during pre-training of the model. It has the same applications as the base model it extends.
Parameter count Depends on the base model being extended.
Bibliography Kurt Shuster, Mojtaba Komeili, Leonard Adolphs, Stephen Roller, Arthur Szlam, Jason Weston. March 29, 2022. "</description></item><item><title>RoBERTa</title><link>https://hugocisneros.com/notes/roberta/</link><pubDate>Wed, 27 Jul 2022 10:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/roberta/</guid><description> tags Transformers, BERT, NLP paper (Liu et al. 2019) Architecture This is an extension of BERT with more data and a better optimized training procedure.
Parameter count 356M
Bibliography Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. July 26, 2019. "RoBERTa: A Robustly Optimized BERT Pretraining Approach". July 26, 2019http://arxiv.org/abs/1907.11692.</description></item><item><title>Pegasus</title><link>https://hugocisneros.com/notes/pegasus/</link><pubDate>Wed, 27 Jul 2022 10:45:00 +0200</pubDate><guid>https://hugocisneros.com/notes/pegasus/</guid><description> tags Transformers, NLP paper (Zhang et al. 2020) Architecture This is a standard encoder/decoder architecture with a special pre-training task suited for summarization of text.
Parameter count Base = 223M Large = 568M Bibliography Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu. July 10, 2020. "PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization". July 10, 2020http://arxiv.org/abs/1912.08777.</description></item><item><title>PaLM</title><link>https://hugocisneros.com/notes/palm/</link><pubDate>Wed, 27 Jul 2022 10:43:00 +0200</pubDate><guid>https://hugocisneros.com/notes/palm/</guid><description>tags Transformers, NLP paper (Chowdhery et al. 2022) Architecture This is a standard decoder-only architecture with some specific extensions:
SwiGLU activation functions Parallel layers Multi-query attention RoPE embeddings Shared input-output embeddings No biaises A 256k SentencePiece vocabulary generated from the training data Parameter count 540B
Bibliography Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, et al.. April 19, 2022. "PaLM: Scaling Language Modeling with Pathways"</description></item><item><title>OPT: Open Pre-trained Transformer</title><link>https://hugocisneros.com/notes/opt/</link><pubDate>Wed, 27 Jul 2022 10:40:00 +0200</pubDate><guid>https://hugocisneros.com/notes/opt/</guid><description> tags Transformers, GPT, NLP paper (Zhang et al. 2022) Architecture It is the same architecture as GPT-3 but with some training improvements from Megatron.
Parameter count 175B
Bibliography Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, et al.. June 21, 2022. "OPT: Open Pre-trained Transformer Language Models". June 21, 2022http://arxiv.org/abs/2205.01068.</description></item><item><title>Minerva</title><link>https://hugocisneros.com/notes/minerva/</link><pubDate>Tue, 26 Jul 2022 15:21:00 +0200</pubDate><guid>https://hugocisneros.com/notes/minerva/</guid><description> tags Transformers, Mathematics, PaLM paper (Lewkowycz et al. 2022) Architecture This model is PaLM fine-tuned on mathematical datasets.
Parameter count 540B
Bibliography Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, et al.. June 30, 2022. "Solving Quantitative Reasoning Problems with Language Models". June 30, 2022http://arxiv.org/abs/2206.14858.</description></item><item><title>Megatron</title><link>https://hugocisneros.com/notes/megatron/</link><pubDate>Tue, 26 Jul 2022 15:18:00 +0200</pubDate><guid>https://hugocisneros.com/notes/megatron/</guid><description> tags Transformers, GPT, BERT, T5 paper (Shoeybi et al. 2020) Architecture The principle of Megatron is to extend existing architectures by using model parallelism. It has a number of parameters that depends on the base model used.
Bibliography Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, Bryan Catanzaro. March 13, 2020. "Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism". March 13, 2020DOI.</description></item><item><title>mBART</title><link>https://hugocisneros.com/notes/mbart/</link><pubDate>Tue, 26 Jul 2022 15:15:00 +0200</pubDate><guid>https://hugocisneros.com/notes/mbart/</guid><description> tags Transformers, NLP, BART paper (Liu et al. 2020) Architecture It&amp;rsquo;s an encoder-decoder architecture based on BART
Bibliography Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer. January 23, 2020. "Multilingual Denoising Pre-training for Neural Machine Translation". January 23, 2020DOI.</description></item><item><title>LAMDA</title><link>https://hugocisneros.com/notes/lamda/</link><pubDate>Tue, 26 Jul 2022 11:53:00 +0200</pubDate><guid>https://hugocisneros.com/notes/lamda/</guid><description> tags Transformers, NLP paper (Thoppilan et al. 2022) Parameter count 137B
Bibliography Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, et al.. February 10, 2022. "LaMDA: Language Models for Dialog Applications". February 10, 2022http://arxiv.org/abs/2201.08239.</description></item><item><title>Jurassic-1</title><link>https://hugocisneros.com/notes/jurassic_1/</link><pubDate>Tue, 26 Jul 2022 11:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/jurassic_1/</guid><description> tags Transformers, GPT, NLP blog post AI21Labs blog Architecture This model is similar to GPT-3 with an improved tokenizer that increases the learning efficiency. It also has more parameters.
Parameter count 178B
Bibliography</description></item><item><title>Imagen</title><link>https://hugocisneros.com/notes/imagen/</link><pubDate>Tue, 26 Jul 2022 11:41:00 +0200</pubDate><guid>https://hugocisneros.com/notes/imagen/</guid><description>tags Transformers, Diffusion models, Computer vision, NLP, T5, CLIP paper (Saharia et al. 2022) Architecture This is based on the U-net diffusion architecture with a few extensions. T5 or CLIP or BERT is used as a frozen text encoder.
Parameter count 2B
Bibliography Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, et al.. May 23, 2022. "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding"</description></item><item><title>GPTInstruct</title><link>https://hugocisneros.com/notes/gptinstruct/</link><pubDate>Tue, 26 Jul 2022 11:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/gptinstruct/</guid><description> tags Transformers, GPT, NLP paper (Ouyang et al. 2022) Architecture This model starts off from a pretrained GPT-3. Reward modeling is added with Reinforcement learning.
Parameter count 175B
Bibliography Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, et al.. March 4, 2022. "Training Language Models to Follow Instructions with Human Feedback". March 4, 2022DOI.</description></item><item><title>GPT-Neo</title><link>https://hugocisneros.com/notes/gpt_neo/</link><pubDate>Tue, 26 Jul 2022 11:06:00 +0200</pubDate><guid>https://hugocisneros.com/notes/gpt_neo/</guid><description> tags Transformers, GPT, NLP software &amp;lt;&amp;amp;gpt-neo&amp;gt; Architecture This model is very similar to GPT-2, with the addition of local attention every other layer and a window size of 256 tokens.
Parameter count 1.5B, 2.7B (XL)
Bibliography</description></item><item><title>Global context ViT</title><link>https://hugocisneros.com/notes/global_context_vit/</link><pubDate>Tue, 26 Jul 2022 11:04:00 +0200</pubDate><guid>https://hugocisneros.com/notes/global_context_vit/</guid><description> tags Transformers, Computer vision, ViT paper (Hatamizadeh et al. 2022) Architecture This is a hierarchical version of ViT with both local and global attention.
Parameter count 90M
Bibliography Ali Hatamizadeh, Hongxu Yin, Jan Kautz, Pavlo Molchanov. June 20, 2022. "Global Context Vision Transformers". June 20, 2022DOI.</description></item><item><title>GLIDE</title><link>https://hugocisneros.com/notes/glide/</link><pubDate>Tue, 26 Jul 2022 11:02:00 +0200</pubDate><guid>https://hugocisneros.com/notes/glide/</guid><description> tags Diffusion models, NLP, Computer vision paper (Nichol et al. 2022) Architecture This model uses joint textual and visual embedding diffusion model followed by some upsampling.
Parameter count 3.5B
Bibliography Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, Mark Chen. March 8, 2022. "GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models". March 8, 2022DOI.</description></item><item><title>GPT-3</title><link>https://hugocisneros.com/notes/gpt_3/</link><pubDate>Tue, 26 Jul 2022 10:06:00 +0200</pubDate><guid>https://hugocisneros.com/notes/gpt_3/</guid><description> tags Transformers, NLP, GPT paper (Brown et al. 2020) Architecture Like GPT-2, with the addition of locally banded sparse attention.
Parameter count 175B
Bibliography Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al.. June 4, 2020. "Language Models Are Few-Shot Learners". http://arxiv.org/abs/2005.14165.</description></item><item><title>GPT-2</title><link>https://hugocisneros.com/notes/gpt_2/</link><pubDate>Tue, 26 Jul 2022 10:04:00 +0200</pubDate><guid>https://hugocisneros.com/notes/gpt_2/</guid><description> tags Transformers, GPT paper (Radford et al. 2019) Architecture Some minor changes from GPT, like a larger context and some order change of normalization.
Parameter count 1.5B
Bibliography Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. 2019. "Language Models Are Unsupervised Multitask Learners". OpenAI Blog 1 (8):9.</description></item><item><title>GLaM</title><link>https://hugocisneros.com/notes/glam/</link><pubDate>Tue, 26 Jul 2022 10:01:00 +0200</pubDate><guid>https://hugocisneros.com/notes/glam/</guid><description>tags Transformers, NLP paper (Du et al. 2021) Architecture The model is a mixture of 64 expert decoder-only transformer architectures. Two experts are activated per token, making the model relatively efficient for its number of parameters
Parameter count 1.2T total, 96B active per token.
Bibliography Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, et al.. December 13, 2021. "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts"</description></item><item><title>Flamingo</title><link>https://hugocisneros.com/notes/flamingo/</link><pubDate>Tue, 26 Jul 2022 09:56:00 +0200</pubDate><guid>https://hugocisneros.com/notes/flamingo/</guid><description> tags Transformers, Computer vision, NLP, Chinchilla paper (Alayrac et al. 2022) Architecture Uses a frozen language model (e.g. Chinchilla) that is conditioned on a visual representation given from a normalizer-free ResNet.
Parameter count 80B
Bibliography Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, et al.. April 29, 2022. "Flamingo: A Visual Language Model for Few-Shot Learning". April 29, 2022http://arxiv.org/abs/2204.14198.</description></item><item><title>ERNIE</title><link>https://hugocisneros.com/notes/ernie/</link><pubDate>Tue, 26 Jul 2022 09:51:00 +0200</pubDate><guid>https://hugocisneros.com/notes/ernie/</guid><description> tags Transformers, BERT, NLP paper (Zhang et al. 2019) Architecture This transformer uses two stacked BERT for encoding: one for the text, one for the entities in a knowledge graph.
Parameter count 114M
Bibliography Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, Qun Liu. June 4, 2019. "ERNIE: Enhanced Language Representation with Informative Entities". June 4, 2019DOI.</description></item><item><title>ELECTRA</title><link>https://hugocisneros.com/notes/electra/</link><pubDate>Tue, 26 Jul 2022 09:08:00 +0200</pubDate><guid>https://hugocisneros.com/notes/electra/</guid><description> tags Transformers, NLP paper (Clark et al. 2020) Paramter count Base = 110M Large = 330M Bibliography Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning. 2020. "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators". In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=r1xMH1BtvB.</description></item><item><title>DQ-BART</title><link>https://hugocisneros.com/notes/dq_bart/</link><pubDate>Tue, 26 Jul 2022 09:05:00 +0200</pubDate><guid>https://hugocisneros.com/notes/dq_bart/</guid><description> tags Transformers, BART, NLP paper (Li et al. 2022) Architecture It is a distilled and quantized version of BART. It improves performance as well as the model size.
Bibliography Zheng Li, Zijian Wang, Ming Tan, Ramesh Nallapati, Parminder Bhatia, Andrew Arnold, Bing Xiang, Dan Roth. March 21, 2022. "DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization". March 21, 2022DOI.</description></item><item><title>DistillBERT</title><link>https://hugocisneros.com/notes/distillbert/</link><pubDate>Tue, 26 Jul 2022 08:43:00 +0200</pubDate><guid>https://hugocisneros.com/notes/distillbert/</guid><description> tags Transformers, BERT, NLP paper (Sanh et al. 2020) Architecture It is a distilled version of BERT that is much more efficient.
Parameter count 66M
Bibliography Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf. February 29, 2020. "DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter". February 29, 2020DOI.</description></item><item><title>DialoGPT</title><link>https://hugocisneros.com/notes/dialogpt/</link><pubDate>Fri, 22 Jul 2022 13:07:00 +0200</pubDate><guid>https://hugocisneros.com/notes/dialogpt/</guid><description> tags GPT, Transformers, NLP paper (Zhang et al. 2020) Architecture It is exactly like a GPT-2 architecture but trained on dialog data.
Parameter count 1.5B
Bibliography Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan. May 2, 2020. "DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation". May 2, 2020DOI.</description></item><item><title>Decision transformer</title><link>https://hugocisneros.com/notes/decision_transformer/</link><pubDate>Fri, 22 Jul 2022 13:03:00 +0200</pubDate><guid>https://hugocisneros.com/notes/decision_transformer/</guid><description>tags Transformers, GPT, Reinforcement learning paper (Chen et al. 2021) Architecture This is a decoder model that uses a GPT-like model to encode and predict trajectories for Reinforcement learning tasks. It has essentially the same characteristics as GPT.
Bibliography Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch. June 24, 2021. "Decision Transformer: Reinforcement Learning via Sequence Modeling". June 24, 2021DOI.</description></item><item><title>ALBERT</title><link>https://hugocisneros.com/notes/albert/</link><pubDate>Fri, 22 Jul 2022 13:02:00 +0200</pubDate><guid>https://hugocisneros.com/notes/albert/</guid><description>tags Transformers, BERT, NLP paper (Lan et al. 2020) Architecture It is an encoder-only architecture. It extends BERT by using parameter-sharing and is more efficient than BERT with the same number of parameters.
Parameter count Base = 12M Large = 18M XLarge = 60M Bibliography Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. February 8, 2020. "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"</description></item><item><title>BERT</title><link>https://hugocisneros.com/notes/bert/</link><pubDate>Fri, 22 Jul 2022 13:02:00 +0200</pubDate><guid>https://hugocisneros.com/notes/bert/</guid><description> tags Transformers, NLP paper (Devlin et al. 2019) Parameter count Base = 110M Large = 340M Bibliography Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. May 24, 2019. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". May 24, 2019DOI.</description></item><item><title>BLOOM</title><link>https://hugocisneros.com/notes/bloom/</link><pubDate>Fri, 22 Jul 2022 13:02:00 +0200</pubDate><guid>https://hugocisneros.com/notes/bloom/</guid><description> tags Transformers, GPT, NLP blog post BLOOM announcement blog post Architecture It is similar to the architecture of GPT-3, using full attention instead of sparse attention.
Parameter count 176B
Bibliography</description></item><item><title>CTRL</title><link>https://hugocisneros.com/notes/ctrl/</link><pubDate>Fri, 22 Jul 2022 13:02:00 +0200</pubDate><guid>https://hugocisneros.com/notes/ctrl/</guid><description>tags Transformers, NLP paper (Keskar et al. 2019) Architecture This is a model that can generate text conditioned on control codes that specify the domain, style, topics, dates, entities, relationships between entities, plot points, and task-related behavior of the text.
Parameter count 1.63B
Bibliography Nitish Shirish Keskar, Bryan McCann, Lav R. Varshney, Caiming Xiong, Richard Socher. September 20, 2019. "CTRL: A Conditional Transformer Language Model for Controllable Generation". September 20, 2019DOI.</description></item><item><title>Big bird</title><link>https://hugocisneros.com/notes/big_bird/</link><pubDate>Fri, 22 Jul 2022 13:01:00 +0200</pubDate><guid>https://hugocisneros.com/notes/big_bird/</guid><description>tags Transformers, NLP paper (Zaheer et al. 2021) Architecture Big bird can be used as both an encoder-only and an encoder/decoder architecture.
It extends the likes of BERT by implementing a sparse attention mechanism, making the attention computational complexity less than quadratic.
Bibliography Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, et al.. January 8, 2021. "Big Bird: Transformers for Longer Sequences". January 8, 2021DOI.</description></item><item><title>DALL-E-2</title><link>https://hugocisneros.com/notes/dall_e_2/</link><pubDate>Fri, 22 Jul 2022 13:00:00 +0200</pubDate><guid>https://hugocisneros.com/notes/dall_e_2/</guid><description>tags Transformers, Diffusion models, CLIP paper (Ramesh et al. 2022) Architecture This is the successor of DALL-E, it is an encoder/decoder model that uses a combination of CLIP and Diffusion models to generate images from text. The diffusion decoder is similar to GLIDE.
Parameter count 3.5B
Bibliography Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen. April 12, 2022. "Hierarchical Text-Conditional Image Generation with CLIP Latents". April 12, 2022DOI.</description></item><item><title>DALL-E</title><link>https://hugocisneros.com/notes/dall_e/</link><pubDate>Fri, 22 Jul 2022 12:53:00 +0200</pubDate><guid>https://hugocisneros.com/notes/dall_e/</guid><description> tags Transformers, GPT paper (Ramesh et al. 2021) Architecture It is a decoder architecture with a Variational autoencoders and a variant of GPT-3 to convert text to images.
Parameter count 12B
Bibliography Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, Ilya Sutskever. February 26, 2021. "Zero-Shot Text-to-Image Generation". February 26, 2021DOI.</description></item><item><title>CLIP</title><link>https://hugocisneros.com/notes/clip/</link><pubDate>Fri, 22 Jul 2022 12:29:00 +0200</pubDate><guid>https://hugocisneros.com/notes/clip/</guid><description> tags Transformers, NLP, Computer vision paper (Radford et al. 2021) Architecture It is an encoder-only model which combines ViT and ResNet to encode images and a transformer for the text encoding.
Bibliography Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, et al.. February 26, 2021. "Learning Transferable Visual Models From Natural Language Supervision". February 26, 2021DOI.</description></item><item><title>Residual neural networks</title><link>https://hugocisneros.com/notes/residual_networks/</link><pubDate>Fri, 22 Jul 2022 12:28:00 +0200</pubDate><guid>https://hugocisneros.com/notes/residual_networks/</guid><description>tags Neural networks, Convolutional neural networks, Computer vision resources (He et al. 2016) Residual neural networks are neural networks with skip-connections (or shortcuts, residual connections) that will bypass some of the networks operations in depth.
Highway networks (Srivastava et al. 2015)
DenseNets (&amp;lt;cite itemprop=&amp;ldquo;citation&amp;rdquo; itemscope=&amp;ldquo;&amp;ldquo;Huang, Liu ,n.d.)
Bibliography Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. June 2016. "Deep Residual Learning for Image Recognition". In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–78.</description></item><item><title>Chinchilla</title><link>https://hugocisneros.com/notes/chinchilla/</link><pubDate>Fri, 22 Jul 2022 12:27:00 +0200</pubDate><guid>https://hugocisneros.com/notes/chinchilla/</guid><description> tags Transformers, GPT, NLP paper (Hoffmann et al. 2022) Architecture This model is very similar to Gopher, with some improvements to make the model smaller and more efficient.
Parameter count 70B
Bibliography Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, et al.. March 29, 2022. "Training Compute-Optimal Large Language Models". March 29, 2022DOI.</description></item><item><title>Positional encoding</title><link>https://hugocisneros.com/notes/positional_encoding/</link><pubDate>Fri, 22 Jul 2022 11:57:00 +0200</pubDate><guid>https://hugocisneros.com/notes/positional_encoding/</guid><description> tags Transformers, Attention</description></item><item><title>BART</title><link>https://hugocisneros.com/notes/bart/</link><pubDate>Fri, 22 Jul 2022 10:11:00 +0200</pubDate><guid>https://hugocisneros.com/notes/bart/</guid><description> tags Transformers paper (Lewis et al. 2019) Architecture It is an encoder/decoder architecture. The encoder is based on BERT and the decoder is based on GPT. It generalizes the two models into a single one.
Bibliography Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer. October 29, 2019. "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension". October 29, 2019DOI.</description></item><item><title>Byte-pair encoding</title><link>https://hugocisneros.com/notes/byte_pair_encoding/</link><pubDate>Thu, 14 Jul 2022 09:34:00 +0200</pubDate><guid>https://hugocisneros.com/notes/byte_pair_encoding/</guid><description>tags NLP The process of byte-pair encoding can be summarized as follow:
Each character is a token Find pairs that occur most often Create a new token that encoded those common pairs Repeat the process until target vocabulary size is reached The output of this process is both a vocabulary and a set of merging rules for tokens to be used to process more data.
This technique has several advantages:</description></item><item><title>Land-value tax</title><link>https://hugocisneros.com/notes/land_value_tax/</link><pubDate>Wed, 15 Jun 2022 13:25:00 +0200</pubDate><guid>https://hugocisneros.com/notes/land_value_tax/</guid><description> tags Economics, Taxation resources Wikipedia</description></item><item><title>Taxation</title><link>https://hugocisneros.com/notes/taxation/</link><pubDate>Wed, 15 Jun 2022 13:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/taxation/</guid><description> tags Economics</description></item><item><title>Motion planning</title><link>https://hugocisneros.com/notes/motion_planning/</link><pubDate>Wed, 15 Jun 2022 13:02:00 +0200</pubDate><guid>https://hugocisneros.com/notes/motion_planning/</guid><description> tags Artificial Intelligence</description></item><item><title>Pytorch</title><link>https://hugocisneros.com/notes/pytorch/</link><pubDate>Tue, 07 Jun 2022 11:53:00 +0200</pubDate><guid>https://hugocisneros.com/notes/pytorch/</guid><description>tags Python, Machine learning Pytorch is an autodiff library used to do machine learning in Python.
Pytorch tricks I don&amp;rsquo;t know who originally made this list. I also don&amp;rsquo;t know how many of those have been addressed in recent versions. If some of these tricks are not valid anymore let me know:
DataLoader has bad default settings, tune num_workers &amp;gt; 0 and default to pin_memory = True. Use torch.backends.cudnn.benchmark = True to autotune cudnn kernel choice Max out the batch size for each GPU to ammortize compute.</description></item><item><title>Article: Uncertain times</title><link>https://hugocisneros.com/notes/article_uncertain_times/</link><pubDate>Tue, 07 Jun 2022 09:11:00 +0200</pubDate><guid>https://hugocisneros.com/notes/article_uncertain_times/</guid><description>authors Melanie Mitchell, Jessica Flack source Aeon tags Complex Systems This article is about adopting a complex systems-based view of societal phenomena. All human societies are a collective of individuals with coupled behaviors.
As a result, large-scale society-wide information and local behaviors are coupled. This can be appealing, but also leads to surprising behavior. In complex systems, noise feeding back onto itself can lead to transitions to orderly states. In practice, this means individual slightly random positive decisions can results in a society-scale negative behavior.</description></item><item><title>Open-ended Evolution</title><link>https://hugocisneros.com/notes/open_ended_evolution/</link><pubDate>Mon, 06 Jun 2022 16:03:00 +0200</pubDate><guid>https://hugocisneros.com/notes/open_ended_evolution/</guid><description>tags Evolution, Complexity, Artificial Intelligence resources Open-endedness: The last grand challenge you’ve never heard of Objectives and open-endedness Objective functions are commonly used in all areas of machine learning (even so-called unsupervised learning, or evolutionary strategies). However this seems to be fundamentally opposed to how natural evolution proceeds. Innovations appear without a priori objectives and it isn&amp;rsquo;t clear if setting such objectives for ourselves is slowing down progress by putting the focus on too narrow paths to a solution (Woolley, Stanley 2011; Stanley, Lehman 2015).</description></item><item><title>Woke</title><link>https://hugocisneros.com/notes/woke/</link><pubDate>Thu, 02 Jun 2022 10:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/woke/</guid><description>tags Society Genealogy of the term From (Cammaerts 2022):
Woke is intrinsically tied to black consciousness and anti-racist struggles. It was a black slang word which was first referenced in popular culture during a spoken word section at the end of a recording of the 1938 protest folksong ‘Scottsboro Boys’ by Lead Belly. The song refers to the gruesome case of nine black youth who were falsely accused of raping two white women and whose lives were destroyed by the deeply racist Alabama justice system (Cose 2020).</description></item><item><title>Economics</title><link>https://hugocisneros.com/notes/economics/</link><pubDate>Thu, 02 Jun 2022 10:22:00 +0200</pubDate><guid>https://hugocisneros.com/notes/economics/</guid><description> tags Society</description></item><item><title>Society</title><link>https://hugocisneros.com/notes/society/</link><pubDate>Thu, 02 Jun 2022 10:22:00 +0200</pubDate><guid>https://hugocisneros.com/notes/society/</guid><description/></item><item><title>Asset economy</title><link>https://hugocisneros.com/notes/asset_economy/</link><pubDate>Wed, 01 Jun 2022 22:23:00 +0200</pubDate><guid>https://hugocisneros.com/notes/asset_economy/</guid><description> tags Economics books (Adkins et al. 2020) Bibliography Lisa Adkins, Martijn Konings, Melinda Cooper. 2020. The Asset Economy: Property Ownership and the New Logic of Inequality. Polity Press.</description></item><item><title>Extractivism</title><link>https://hugocisneros.com/notes/extractivism/</link><pubDate>Wed, 01 Jun 2022 21:44:00 +0200</pubDate><guid>https://hugocisneros.com/notes/extractivism/</guid><description>tags Climate, Economics From (Chagnon et al. 2022), a piece on extractivism as a concept and its relation to globalization:
Extractivism as a concept forms a complex ensemble of self-reinforcing practices, mentalities, and power differentials underwriting and rationalizing socio-ecologically destructive modes of organizing life through subjugation, violence, depletion, and non-reciprocity.
Bibliography Christopher W. Chagnon, Francesco Durante, Barry K. Gills, Sophia E. Hagolani-Albov, Saana Hokkanen, Sohvi M. J. Kangasluoma, Heidi Konttinen, et al.</description></item><item><title>Globalization</title><link>https://hugocisneros.com/notes/globalization/</link><pubDate>Wed, 01 Jun 2022 20:53:00 +0200</pubDate><guid>https://hugocisneros.com/notes/globalization/</guid><description> tags Economics, Economic liberalism, Complex Systems</description></item><item><title>Greenhouse gas emissions</title><link>https://hugocisneros.com/notes/greenhouse_gas_emissions/</link><pubDate>Wed, 01 Jun 2022 20:37:00 +0200</pubDate><guid>https://hugocisneros.com/notes/greenhouse_gas_emissions/</guid><description>tags Climate Scopes Greenhouse gas emissions are often classified in 3 scopes, which roughly correspond to how &amp;ldquo;direct&amp;rdquo; the emission is, or how close is the actual source emitting the gas.
Scope 1 are the direct emissions from combustion of fuel and direct use of fossil fuels.
Scope 2 corresponds to indirect emission from energy usage. This includes any electricity that was produced from a fossil fuel source.
Scope 3 is all the other indirect emissions anywhere in the value chain.</description></item><item><title>Semantic primes</title><link>https://hugocisneros.com/notes/semantic_primes/</link><pubDate>Wed, 01 Jun 2022 16:12:00 +0200</pubDate><guid>https://hugocisneros.com/notes/semantic_primes/</guid><description> tags Language resources Wikipedia</description></item><item><title>The Scaling Hypothesis</title><link>https://hugocisneros.com/notes/the_scaling_hypothesis/</link><pubDate>Tue, 31 May 2022 15:22:00 +0200</pubDate><guid>https://hugocisneros.com/notes/the_scaling_hypothesis/</guid><description>tags Artificial Intelligence, Neural networks From gwern&amp;rsquo;s website:
The scaling hypothesis: neural nets absorb data &amp;amp; compute, generalizing and becoming more Bayesian as problems get harder, manifesting new abilities even at trivial-by-global-standards-scale.</description></item><item><title>Projection on convex sets</title><link>https://hugocisneros.com/notes/projection_on_convex_sets/</link><pubDate>Wed, 25 May 2022 13:27:00 +0200</pubDate><guid>https://hugocisneros.com/notes/projection_on_convex_sets/</guid><description>tags Optimization To solve the problem of finding \(x \in \mathbb{R}^n\) such that \(x\in C \cap D\) where \(C\) and \(D\) are closed convex sets, we project a candidate solution onto \(D\) and \(C\) successively until it converges to a point in the intersection.
\[ x_{k+1} = \mathcal{P}_C (\mathcal{P}_D (x_k)) \]</description></item><item><title>Notes on: Memorizing Transformers by Wu, Y., Rabe, M. N., Hutchins, D., &amp; Szegedy, C. (2022)</title><link>https://hugocisneros.com/notes/wumemorizingtransformers2022/</link><pubDate>Wed, 25 May 2022 13:26:00 +0200</pubDate><guid>https://hugocisneros.com/notes/wumemorizingtransformers2022/</guid><description>source (Wu et al. 2022) tags Transformers, Memory in neural networks TODO Summary This paper introduces a method to extend the classical Transformer neural network model with an addressable memory that can be queried and updated at inference time.
This memory is addressed using an attention mechanism. It is a set of cached attention (key, value) vector pairs. At some arbitrary depth of the attention &amp;ldquo;stack&amp;rdquo; the memory mechanism is inserted.</description></item><item><title>Memory in neural networks</title><link>https://hugocisneros.com/notes/memory_in_neural_networks/</link><pubDate>Fri, 20 May 2022 23:38:00 +0200</pubDate><guid>https://hugocisneros.com/notes/memory_in_neural_networks/</guid><description> tags Neural networks</description></item><item><title>NEAT</title><link>https://hugocisneros.com/notes/neat/</link><pubDate>Tue, 03 May 2022 11:27:00 +0200</pubDate><guid>https://hugocisneros.com/notes/neat/</guid><description> tags Neural architecture search, Evolutionary strategies, Neural networks papers (Stanley, Miikkulainen 2002) Bibliography Kenneth O. Stanley, Risto Miikkulainen. June 2002. "Evolving Neural Networks Through Augmenting Topologies". Evolutionary Computation 10 (2):99–127. DOI.&amp;nbsp;See notes</description></item><item><title>Kerberos</title><link>https://hugocisneros.com/notes/kerberos/</link><pubDate>Tue, 03 May 2022 10:51:00 +0200</pubDate><guid>https://hugocisneros.com/notes/kerberos/</guid><description>tags Network authentication, Cryptography resources Main page, Computerphile video Kerberos is a centralized authentication protocol that uses symmetric encryption as its main way of ensuring online privacy on a network with a trusted central entity (e.g. a corporate network).
A central server must have long term keys for every user on the network. It uses these keys to securely issue session keys with other devices on the network thanks to a Ticket-granting server (TGS).</description></item><item><title>Network authentication</title><link>https://hugocisneros.com/notes/network_authentication/</link><pubDate>Tue, 03 May 2022 10:47:00 +0200</pubDate><guid>https://hugocisneros.com/notes/network_authentication/</guid><description>Protocols Password authentication This is one of the simplest authentication method. The idea is just to send a pair (username, password) to the server. It is obviously vulnerable to man-in-the-middle attacks.
Kerberos Secure Socket Layer (SSL) and Transport Layer Security (TLS)</description></item><item><title>Turing-completeness</title><link>https://hugocisneros.com/notes/turing_completeness/</link><pubDate>Tue, 03 May 2022 10:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/turing_completeness/</guid><description>tags Computability theory, Computer science A system is Turing complete if it can be used to simulate any Turing Machine.
Examples of Turing complete systems Some cellular automata Most Programming languages Lambda calculus Combinatory logic Others like Post-Turing Machines, formal grammar, formal language, etc. Some games (Minecraft, baba is you) and computational languages (markup languages like HTML+CSS) Other suprisingly turing complete systems that show that after a certain level of complexity it becomes possible to &amp;ldquo;stumble upon&amp;rdquo; turing completeness.</description></item><item><title>Artificial Intelligence</title><link>https://hugocisneros.com/notes/artificial_intelligence/</link><pubDate>Tue, 03 May 2022 10:45:00 +0200</pubDate><guid>https://hugocisneros.com/notes/artificial_intelligence/</guid><description>Creating artificial intelligence through evolution If we take an evolutionary approach to the creation of AI, we may run into some problems. Of course, it is an extremely appealing idea, as it has been proven to work in our &amp;ldquo;Earth experiment&amp;rdquo;. Somehow, life and intelligent behavior has emerged from the synergy between the emergence of so-called living systems, Darwinian evolution and interactions with the environment.
However a crucial issue is: Is there any shortcut in this approach?</description></item><item><title>AI and climate change</title><link>https://hugocisneros.com/notes/ai_and_climate_change/</link><pubDate>Mon, 02 May 2022 20:45:00 +0200</pubDate><guid>https://hugocisneros.com/notes/ai_and_climate_change/</guid><description>tags Machine learning, Artificial Intelligence, Climate A general-purpose resource: (Dobbe, Whittaker 2019).
A more technical resource with machine learning in mind: (Rolnick et al. 2019).
Bibliography R. Dobbe, M. Whittaker. October 2019. "AI and Climate Change: How They’re Connected, and What We Can Do About It". October 2019https://medium.com/@AINowInstitute/ai-and-climate-change-how-theyre-connected-and-what-we-can-do-about-it-6aa8d0f5b32c. David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, et al.. November 5, 2019.</description></item><item><title>AI capitalism</title><link>https://hugocisneros.com/notes/ai_capitalism/</link><pubDate>Mon, 02 May 2022 20:02:00 +0200</pubDate><guid>https://hugocisneros.com/notes/ai_capitalism/</guid><description> tags Capitalism, Artificial Intelligence (Verdegem 2022)
Bibliography Pieter Verdegem. April 9, 2022. "Dismantling AI Capitalism: The Commons as an Alternative to the Power Concentration of Big Tech". AI &amp; SOCIETY. DOI.</description></item><item><title>Capitalism</title><link>https://hugocisneros.com/notes/capitalism/</link><pubDate>Mon, 02 May 2022 20:01:00 +0200</pubDate><guid>https://hugocisneros.com/notes/capitalism/</guid><description> tags Economics</description></item><item><title>Text classification</title><link>https://hugocisneros.com/notes/text_classification/</link><pubDate>Mon, 02 May 2022 10:43:00 +0200</pubDate><guid>https://hugocisneros.com/notes/text_classification/</guid><description> tags NLP resources (Minaee et al. 2020) A few examples are often cited as major applications of text classification:
Spam detection Sentiment analysis Auto-tagging Categorization into topics Bibliography Shervin Minaee, Nal Kalchbrenner, Erik Cambria, Narjes Nikzad, Meysam Chenaghlu, Jianfeng Gao. April 5, 2020. "Deep Learning Based Text Classification: A Comprehensive Review". http://arxiv.org/abs/2004.03705.</description></item><item><title>Amorphous computing</title><link>https://hugocisneros.com/notes/amorphous_computing/</link><pubDate>Mon, 02 May 2022 10:33:00 +0200</pubDate><guid>https://hugocisneros.com/notes/amorphous_computing/</guid><description>tags Unconventional computing, Self-organization papers (Abelson et al. 2000) resources Wikipedia, CSAIL&amp;rsquo;s website Amorphous computing was coined by Abelson, Knight, Sussman et al. It refers to computational systems composed of a large number of identical parallel devices (processors) with limited computational capacity. The processors interact locally, without particular knowledge of their position in the medium.
From (Abelson et al. 2000):
A colony of cells cooperates to form a multicellular organism under the direction of a genetic program shared by the members of the colony.</description></item><item><title>Ising model</title><link>https://hugocisneros.com/notes/ising_model/</link><pubDate>Mon, 02 May 2022 10:25:00 +0200</pubDate><guid>https://hugocisneros.com/notes/ising_model/</guid><description>tags Complex Systems, Physics Simulation of an Ising model by a cellular automaton Several works have proposed and later refined cellular automaton-base algorithms of Ising models. (Vichniac 1984; Herrmann 1986; Ottavi, Parodi 1989)
Bibliography Gérard Y. Vichniac. January 1, 1984. "Simulating Physics with Cellular Automata". Physica D: Nonlinear Phenomena 10 (1):96–116. DOI. H. J. Herrmann. October 1, 1986. "Fast Algorithm for the Simulation of Ising Models". Journal of Statistical Physics 45 (1):145–51.</description></item><item><title>You and your research</title><link>https://hugocisneros.com/notes/you_and_your_research/</link><pubDate>Sun, 01 May 2022 20:37:00 +0200</pubDate><guid>https://hugocisneros.com/notes/you_and_your_research/</guid><description>tags Writing source Web In the first place if you do some good work you will find yourself on all kinds of committees and unable to do any more work.</description></item><item><title>The Meta-Problem of Consciousness with David Chalmers</title><link>https://hugocisneros.com/notes/the_meta_problem_of_consciousness_with_david_chalmers/</link><pubDate>Wed, 27 Apr 2022 10:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/the_meta_problem_of_consciousness_with_david_chalmers/</guid><description>tags Consciousness link Youtube The hard problem of consciousness Why and how physical processes give rise to consciousness. This often refers to phenomenal consciousness or &amp;ldquo;what it&amp;rsquo;s like to be a subject&amp;rdquo;.
Phenomenally conscious:
For a system, if there something it&amp;rsquo;s like to be it. For a mental state, if there something it&amp;rsquo;s like to be in that state. This includes, visual and other sensory experiences, bodily sensations, mental imagery, emotions</description></item><item><title>Nix</title><link>https://hugocisneros.com/notes/nix/</link><pubDate>Tue, 26 Apr 2022 21:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/nix/</guid><description>tags Coding, Programming languages https://github.com/cideM/dotfiles</description></item><item><title>Knowledge argument</title><link>https://hugocisneros.com/notes/knowledge_argument/</link><pubDate>Tue, 26 Apr 2022 18:21:00 +0200</pubDate><guid>https://hugocisneros.com/notes/knowledge_argument/</guid><description>tags Philosophy The thought experiment as formulated by Franck Jackson in (Jackson 1982):
Mary is a brilliant scientist who is, for whatever reason, forced to investigate the world from a black and white room via a black and white television monitor. She specializes in the neurophysiology of vision and acquires, let us suppose, all the physical information there is to obtain about what goes on when we see ripe tomatoes, or the sky, and use terms like &amp;ldquo;red&amp;rdquo;, &amp;ldquo;blue&amp;rdquo;, and so on.</description></item><item><title>Notes on: GARF: Gaussian Activated Radiance Fields for High Fidelity Reconstruction and Pose Estimation by Chng, S., Ramasinghe, S., Sherrah, J., &amp; Lucey, S. (2022)</title><link>https://hugocisneros.com/notes/chnggarfgaussianactivated2022/</link><pubDate>Tue, 26 Apr 2022 13:08:00 +0200</pubDate><guid>https://hugocisneros.com/notes/chnggarfgaussianactivated2022/</guid><description>tags Implicit neural representations, NeRF source (Chng et al. 2022) web https://sfchng.github.io/garf/ DONE Summary This paper introduces a positional embedding-free NeRF architecture which uses gaussian activation functions. These activation functions were introduced as part of Gaussian-MLPs in (Ramasinghe, Lucey 2022).
This alternative activation function enables GARF to model first derivatives of the target signal better than Positional embeddings MLPs (PE-MLPs) (Mildenhall et al. 2020; Sitzmann et al. 2020). It also overcomes the initialization issues with SIRENs (Sitzmann et al.</description></item><item><title>Implicit neural representations</title><link>https://hugocisneros.com/notes/implicit_neural_representations/</link><pubDate>Tue, 26 Apr 2022 12:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/implicit_neural_representations/</guid><description>tags Data representation, Neural networks resources Sitzmann&amp;rsquo;s Awesome Implicit Neural Representations github page Implicit neural representations is about parameterizing a continuous differentiable signal with a neural network. The signal is encoded within the neural network, providing a possibly more compact representation or allowing smooth parameter-based manipulation of that signal. This is a type of regression problem.
Applications of these learned representations range from simple compression, to 3D scene reconstruction from 2D images, super-resolution, semantic information inference, etc.</description></item><item><title>Schmidhuber on Consciousness</title><link>https://hugocisneros.com/notes/schmidhuber_on_consciousness/</link><pubDate>Tue, 26 Apr 2022 12:08:00 +0200</pubDate><guid>https://hugocisneros.com/notes/schmidhuber_on_consciousness/</guid><description>tags Consciousness link Reddit comment From the reddit comment (links are added by me):
Karl Popper famously said: “All life is problem solving.” No theory of consciousness is necessary to define the objectives of a general problem solver. From an AGI point of view, consciousness is at best a by-product of a general problem solving procedure.
I must admit that I am not a big fan of Tononi&amp;rsquo;s theory. The following may represent a simpler and more general view of consciousness.</description></item><item><title>Consciousness</title><link>https://hugocisneros.com/notes/consciousness/</link><pubDate>Tue, 26 Apr 2022 10:39:00 +0200</pubDate><guid>https://hugocisneros.com/notes/consciousness/</guid><description> tags Philosophy</description></item><item><title>Time to threshold</title><link>https://hugocisneros.com/notes/time_to_threshold/</link><pubDate>Sun, 24 Apr 2022 13:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/time_to_threshold/</guid><description>tags Transfer learning, Reinforcement learning This is a simple metric first mentioned in (Taylor et al. 2007; Taylor, Stone 2007). In the paper by Taylor Stone and Liu, it is defined as:
Time-to-Threshold: Measure the time needed to reach a performance threshold in the target task.
In other words, this metric measures the time spent to reach a target performance for a given learning system.
To write down this metric, we use a</description></item><item><title>Pattern-defeating quicksort</title><link>https://hugocisneros.com/notes/pattern_defeating_quicksort/</link><pubDate>Sun, 24 Apr 2022 13:19:00 +0200</pubDate><guid>https://hugocisneros.com/notes/pattern_defeating_quicksort/</guid><description> tags Algorithm resources Youtube, (Peters 2021) This is a sorting algorithm based on the well known quicksort algorithm. It uses an number of optimizations on top of the base algorithm:
Pivot selection Branchless partitioning Insertion sort base case Bounds check elimination Optimistic pre-sortedness Many equal values Breaking self-similarity \(O(n^2)\) worst-case prevention Bibliography Orson R. L. Peters. June 9, 2021. "Pattern-Defeating Quicksort". http://arxiv.org/abs/2106.05123.</description></item><item><title>Chaos computing</title><link>https://hugocisneros.com/notes/chaos_computing/</link><pubDate>Tue, 19 Apr 2022 17:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/chaos_computing/</guid><description> tags Unconventional computing resources (Munakata et al. 2002) Bibliography T. Munakata, S. Sinha, W.L. Ditto. November 2002. "Chaos Computing: Implementation of Fundamental Logical Gates by Chaotic Elements". IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 49 (11):1629–33. DOI.</description></item><item><title>PDF</title><link>https://hugocisneros.com/notes/pdf/</link><pubDate>Tue, 19 Apr 2022 16:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/pdf/</guid><description> tags Writing Tips Make a PDF look scanned convert -density 150 input.pdf -colorspace gray -linear-stretch 3.5%x10% -blur 0x0.5 \ -attenuate 0.25 +noise Gaussian -rotate 0.5 temp.pdf gs -dSAFER -dBATCH -dNOPAUSE \ -dNOCACHE -sDEVICE=pdfwrite -sColorConversionStrategy=LeaveColorUnchanged \ dAutoFilterColorImages=true -dAutoFilterGrayImages=true -dDownsampleMonoImages=true \ -dDownsampleGrayImages=true -dDownsampleColorImages=true -sOutputFile=output.pdf temp.pdf</description></item><item><title>Combinatory logic</title><link>https://hugocisneros.com/notes/combinatory_logic/</link><pubDate>Tue, 19 Apr 2022 16:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/combinatory_logic/</guid><description> tags Logic papers (Cardone, Hindley 2009) It was independently invented by Moses Schönfinkel, John Von Neumann and Haskell Curry.
A Turing-complete basis of operators is:
\(If\quad\triangleright\quad f\) \(Kfg \quad\triangleright\quad f\) \(Sfgx \quad\triangleright\quad fx(gx)\) Bibliography Felice Cardone, J. Roger Hindley. 2009. "Lambda-Calculus and Combinators in the 20th Century". In Logic from Russell to Church, edited by Dov M. Gabbay and John Woods, 5:723–817. Elsevier. DOI.</description></item><item><title>Artificial life</title><link>https://hugocisneros.com/notes/artificial_life/</link><pubDate>Tue, 19 Apr 2022 14:42:00 +0200</pubDate><guid>https://hugocisneros.com/notes/artificial_life/</guid><description>Artificial life could be thought of as attempts at re-creating biological Life or other types of life. It uses different tools such as biology, physics, chemistry, computer science, etc.
Creating artificial life seems like a possible way to create AI, since most living systems on Earth seem to exhibit some form of robust intelligent behavior.
Definition of artificial life from Carlos Gershenson in (Gershenson 2021)
Beginning in the mid-1980s, ALife has studied living systems using a synthetic approach: building life to understand it better (Aguilar et al.</description></item><item><title>Notes on: Evolving a self-repairing, self-regulating, French flag organism by Miller, J. F. (2004)</title><link>https://hugocisneros.com/notes/millerevolvingselfrepairingselfregulating2004/</link><pubDate>Tue, 19 Apr 2022 14:36:00 +0200</pubDate><guid>https://hugocisneros.com/notes/millerevolvingselfrepairingselfregulating2004/</guid><description> tags Cellular automata, Self-organization, Evolutionary strategies source (Miller 2004) TODO Summary TODO Comments Bibliography Julian Francis Miller. 2004. "Evolving a Self-Repairing, Self-Regulating, French Flag Organism". In Genetic and Evolutionary Computation Conference, 129–39. Springer. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.98.1049&amp;rep=rep1&amp;type=pdf.</description></item><item><title>Few-shot learning</title><link>https://hugocisneros.com/notes/few_shot_learning/</link><pubDate>Tue, 19 Apr 2022 13:31:00 +0200</pubDate><guid>https://hugocisneros.com/notes/few_shot_learning/</guid><description>tags Machine learning, Transfer learning resources AI Multiple post Few-shot learning (FSL) can be considered as a kind of meta-learning problem where the model learns how to learn to solve different problems.
FSL tasks are referred to as N-way K-shots, where N corresponds to the number of examples in each training classes and K is the number of separate training tasks for the model meta-training. A test time, the model will only see N examples of each of the classes it has to learn.</description></item><item><title>Zero-shot learning</title><link>https://hugocisneros.com/notes/zero_shot_learning/</link><pubDate>Tue, 19 Apr 2022 13:28:00 +0200</pubDate><guid>https://hugocisneros.com/notes/zero_shot_learning/</guid><description>tags Few-shot learning, Machine learning, Transfer learning Zero-shot learning is a type of learning task where a model has to perform prediction in an output space never seen during training. In the case of image classification, this corresponds to classifying images into classes never seen during training.</description></item><item><title>Computer vision</title><link>https://hugocisneros.com/notes/computer_vision/</link><pubDate>Tue, 19 Apr 2022 13:24:00 +0200</pubDate><guid>https://hugocisneros.com/notes/computer_vision/</guid><description> tags Machine learning, Image processing Task Image classification Semantic segmentation Object detection Image generation</description></item><item><title>Cryptography</title><link>https://hugocisneros.com/notes/cryptography/</link><pubDate>Sun, 17 Apr 2022 13:13:00 +0200</pubDate><guid>https://hugocisneros.com/notes/cryptography/</guid><description> tags Applied maths, Computer science</description></item><item><title>Applied maths</title><link>https://hugocisneros.com/notes/applied_maths/</link><pubDate>Sun, 17 Apr 2022 13:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/applied_maths/</guid><description> tags Mathematics</description></item><item><title>Emergence</title><link>https://hugocisneros.com/notes/emergence/</link><pubDate>Sun, 17 Apr 2022 13:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/emergence/</guid><description> tags Complexity, Physics</description></item><item><title>Attractor</title><link>https://hugocisneros.com/notes/attractor/</link><pubDate>Thu, 14 Apr 2022 20:05:00 +0200</pubDate><guid>https://hugocisneros.com/notes/attractor/</guid><description> tags Dynamical systems, Physics</description></item><item><title>Notes on: Evolution in asynchronous cellular automata by Nehaniv, C. L. (2003)</title><link>https://hugocisneros.com/notes/nehanivevolutionasynchronouscellular2003/</link><pubDate>Thu, 14 Apr 2022 19:32:00 +0200</pubDate><guid>https://hugocisneros.com/notes/nehanivevolutionasynchronouscellular2003/</guid><description>tags Cellular automata, Evolution source (Nehaniv 2003) Summary This paper proposes a general asynchronous extension of CA rules and show that they can be made equivalent to the original CA rule. Applying this extension to H. Sayama&amp;rsquo;s Evoloop cellular automaton (Sayama 1999), the author creates the first asynchronous implementation of evolution of self-replicators.
One hope formulated by the author is that asynchronicity could help achieve fault-tolerance and self-repair which is something notoriously difficult to do in CA in general.</description></item><item><title>Emacs</title><link>https://hugocisneros.com/notes/emacs/</link><pubDate>Thu, 14 Apr 2022 18:17:00 +0200</pubDate><guid>https://hugocisneros.com/notes/emacs/</guid><description>tags Coding Emacs is many things, including a general text editor, used for writing code or any other text.
Emacs uses ELisp to write configuration code and for scripting.
Tips Delete buffers in helm view In helm buffer list view, individual buffers can be selected with C-Space. Once the buffers you want to delete are selected, M-D will delete them and close helm.</description></item><item><title>Christopher Langton</title><link>https://hugocisneros.com/notes/christopher_langton/</link><pubDate>Tue, 12 Apr 2022 14:15:00 +0200</pubDate><guid>https://hugocisneros.com/notes/christopher_langton/</guid><description>tags Artificial life, Complex Systems He is a researcher in the field of artificial life and complex systems. He developed many tools and systems that the Alife community still uses today.
He studied many interesting properties of cellular automata, including the influence of the \(\lambda\) parameter on the chaotic behavior of these CA. A phase transition occurs for certain values of \(\lambda\), which was coined the edge of Chaos (Langton 1990).</description></item><item><title>Language modeling</title><link>https://hugocisneros.com/notes/language_modeling/</link><pubDate>Mon, 11 Apr 2022 15:07:00 +0200</pubDate><guid>https://hugocisneros.com/notes/language_modeling/</guid><description>tags NLP LM with RNNs Different models have been studied, starting from the initial recurrent neural network based language model (Mikolov et al. 2011). Recurrent neural networks
LSTM were then used with more success than previous models (Zaremba et al. 2015).
Recently, transformers seem to have dominated language modeling. However it is not clear if this is due to their real superiority over RNNs or their practical scalability (Merity 2019).</description></item><item><title>Word vectors</title><link>https://hugocisneros.com/notes/word_vectors/</link><pubDate>Thu, 07 Apr 2022 19:33:00 +0200</pubDate><guid>https://hugocisneros.com/notes/word_vectors/</guid><description>tags NLP Definition Word vectors are abstract representation of words embedded in a dense space.
They are closely related to Language modeling, since the implicit representation a language model builds for prediction can often be used as a word (or sentence) vector.
Word vectors can be extracted from the intermediate representations of RNNs or transformers. They can also be created with dedicated algorithms such as Word2Vec.
Usage Word vectors can encode interesting information, such as semantic similarity between words.</description></item><item><title>Talk: Alife 2020 keynote Luis Zaman - New Frontiers in Alife: What was old is new again</title><link>https://hugocisneros.com/notes/talk_alife_2020_keynote_luis_zaman_new_frontiers_in_alife_what_was_old_is_new_again/</link><pubDate>Thu, 07 Apr 2022 19:32:00 +0200</pubDate><guid>https://hugocisneros.com/notes/talk_alife_2020_keynote_luis_zaman_new_frontiers_in_alife_what_was_old_is_new_again/</guid><description>tags Artificial life, ALife 2020 This keynote is about this sub-community of ALife which is dedicated to constructing actual artificial systems that can exhibit Open-ended Evolution, and Life-like behavior.
The first models that tried to construct ALife were probably cellular automata: Von Neumann&amp;rsquo;s self-reproducing CA and Langton&amp;rsquo;s loop. However, their main limitation was they were extremely brittle, which is why evolution did not really work in them.
Zaman&amp;rsquo;s definition of evolution is</description></item><item><title>Neural networks</title><link>https://hugocisneros.com/notes/neural_networks/</link><pubDate>Wed, 06 Apr 2022 13:48:00 +0200</pubDate><guid>https://hugocisneros.com/notes/neural_networks/</guid><description>tags Machine learning Two-layers neural network Mathematically, a simple two-layers neural network with relu non-linearities can be written like below. For an input vector \(x \in \mathbb{R}^D\), \(\mathbf{a} = (a_1, \cdots, a_N)\in \mathbb{R}^M\) are the output weights, \(\mathbf{b} = (b_1, \cdots, b_N)\in \mathbb{R}^D\) are the input weights
\[ h(x) = \frac{1}{m} \sum_{i=1}^m a_i \max\{ b_i^\top x,0\}, \]
Universal approximation theorem Cybenko showed in 1989 that a neural network of arbitrary width with sigmoid activation function could approximate any continuous function (Cybenko 1989).</description></item><item><title>Org-roam</title><link>https://hugocisneros.com/notes/org_roam/</link><pubDate>Wed, 06 Apr 2022 13:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/org_roam/</guid><description>tags Emacs, Org-mode, Writing website Org-roam Org-roam is an org-mode package that implements features similar to Roam research:
Bi-directional links between notes &amp;mdash; a link from a note to another note is also a &amp;ldquo;backlink&amp;rdquo; from the latter to the former. Possibility to reference whole notes or subtrees transparently Citation and references as links And many others It was created by Jethro Kuan.
Org roam is useful to manage a personal knowledge base in plain text, while taking advantage of the powerful features of Org-mode.</description></item><item><title>Rust</title><link>https://hugocisneros.com/notes/rust/</link><pubDate>Wed, 06 Apr 2022 13:28:00 +0200</pubDate><guid>https://hugocisneros.com/notes/rust/</guid><description> tags Programming languages, Coding Interesting repos The Rust Python interpreter: link</description></item><item><title>Writing</title><link>https://hugocisneros.com/notes/writing/</link><pubDate>Wed, 06 Apr 2022 12:59:00 +0200</pubDate><guid>https://hugocisneros.com/notes/writing/</guid><description/></item><item><title>Kuznets curve</title><link>https://hugocisneros.com/notes/kuznets_curve/</link><pubDate>Tue, 05 Apr 2022 15:52:00 +0200</pubDate><guid>https://hugocisneros.com/notes/kuznets_curve/</guid><description>tags Climate, Economics From (Henriques, Böhm 2022):
This is based on the work of the U.S. American economist Simon Kuznets who, in the 1950s and 1960s, argued that social inequalities first increase with rising economic growth before decreasing over time (Kuznets&amp;rsquo; 1955).
Kuznets curve for pollution and climate science Bibliography Irene Henriques, Steffen Böhm. May 2022. "The Perils of Ecologically Unequal Exchange: Contesting Rare-Earth Mining in Greenland". Journal of Cleaner Production 349 (May):131378.</description></item><item><title>Dirichlet distribution</title><link>https://hugocisneros.com/notes/dirichlet_distribution/</link><pubDate>Thu, 31 Mar 2022 12:39:00 +0200</pubDate><guid>https://hugocisneros.com/notes/dirichlet_distribution/</guid><description> tags Probabilities</description></item><item><title>Probabilities</title><link>https://hugocisneros.com/notes/probabilities/</link><pubDate>Thu, 31 Mar 2022 12:39:00 +0200</pubDate><guid>https://hugocisneros.com/notes/probabilities/</guid><description> tags Mathematics</description></item><item><title>Reservoir computing</title><link>https://hugocisneros.com/notes/reservoir_computing/</link><pubDate>Tue, 15 Mar 2022 11:41:00 +0100</pubDate><guid>https://hugocisneros.com/notes/reservoir_computing/</guid><description>tags Machine learning, Unconventional computing, Unsupervised learning Reservoir computing is a term used to describe a class of machine learning algorithms that rely on transient dynamics of a dynamical system to implement and manipulate goal-related information.
The most famous example is echo-state networks, which uses random recurrent neural networks as reservoirs, but other dynamical systems can also be used.
Reservoir computing with cellular automata Reservoir computing can use cellular automata as the reservoir.</description></item><item><title>Unconventional computing</title><link>https://hugocisneros.com/notes/unconventional_computing/</link><pubDate>Tue, 15 Mar 2022 11:36:00 +0100</pubDate><guid>https://hugocisneros.com/notes/unconventional_computing/</guid><description>papers (Stepney 2012) Molecular computing (Kompa, Levine 2001)
DNA computing (Zhang, Ye 2012)
Light computing (Pittman et al. 2003)
Computing in physical material Computing (Fang et al. 2016)
(Stern et al. 2021)
Storing and encoding memory (Pashine et al. 2019)
(Chen et al. 2021)
Bibliography Susan Stepney. 2012. "Nonclassical Computation — A Dynamical Systems Perspective". In Handbook of Natural Computing, edited by Grzegorz Rozenberg, Thomas Bäck, and Joost N. Kok, 1979–2025.</description></item><item><title>Biological life</title><link>https://hugocisneros.com/notes/life/</link><pubDate>Tue, 15 Mar 2022 11:21:00 +0100</pubDate><guid>https://hugocisneros.com/notes/life/</guid><description>This note refers to the only biological life we&amp;rsquo;ve observed so far: the one on Earth.
From Sara Walker&amp;rsquo;s keynote at Alife 2020: The Natural History of Information:
Life is a process whereby information structures matter across space and time.
Why should life be an emergent process? An answer from (Krakauer et al. 2020):
The fact that physics and chemistry are universal—ongoing in stars, solar systems, and galaxies—whereas to the best of our knowledge biology is exclusively a property of earth, supports the view that life is emergent.</description></item><item><title>Elementary cellular automata</title><link>https://hugocisneros.com/notes/elementary_cellular_automata/</link><pubDate>Tue, 15 Mar 2022 11:02:00 +0100</pubDate><guid>https://hugocisneros.com/notes/elementary_cellular_automata/</guid><description>tags Cellular automata resources Wolfram Mathworld, Wikipedia ECA are one of the simplest form of 1D cellular automata possible. The grid is a 1-dimensional array of cells in state 0 or 1 (dead or alive). The size of the neighborhood being used for the update is 3 (one cell to the left, the main cell and one cell to the right).
Each of those \(2^3\) neighborhoods of size 3 can be mapped to either state 1 or 0.</description></item><item><title>Gradient descent</title><link>https://hugocisneros.com/notes/gradient_descent/</link><pubDate>Mon, 07 Mar 2022 16:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/gradient_descent/</guid><description>tags Optimization, Algorithm resources Slides by Christian S. Perone Fixed learning rate The simplest way to apply the gradient descent algorithm on a function \(g\) convex and $L-$smooth on \(\mathbb{R}^d\) is to use the parameter update:
\[ \theta_t = \theta_{t-1} - \gamma g&amp;rsquo;(\theta_{t-1}) \]
This is based on the standard first-order approximation of the function \(g\). It can be very sensitive to the learning rate and suffer from pathological curvature.</description></item><item><title>Rademacher complexity</title><link>https://hugocisneros.com/notes/rademacher_complexity/</link><pubDate>Mon, 07 Mar 2022 16:44:00 +0100</pubDate><guid>https://hugocisneros.com/notes/rademacher_complexity/</guid><description>tags Machine learning Definition Given a function class \(f_w\) and random iid \(y_\mu \in \{\pm 1\}\), the Rademacher complexity is \[ \mathscr{R}_n = \mathbb{E}_{y, X }\text{sup}_w \frac{1}{n} \sum_{\mu = 1}^n y_\mu f_w(X_\mu) \]
It measures how well a function can approximate a dataset with random labels.
(Bartlett, Mendelson 2002) shows bounds for the Rademacher complexity in terms of \(\ell_1\) norm bounds on the weights of the network. However, (Zhang et al.</description></item><item><title>Compressed sensing</title><link>https://hugocisneros.com/notes/compressed_sensing/</link><pubDate>Mon, 07 Mar 2022 16:07:00 +0100</pubDate><guid>https://hugocisneros.com/notes/compressed_sensing/</guid><description>tags Signal processing resources (Candes et al. 2006) Description Compressed sensing is a technique to recover a sparse signal from partial observations.
The signal is described as a $N$-dimensional vector \(\textbf{s}\). We make \(M\) measurements, where a measurements means a projection of the signal \(\textbf{s}\) onto some known vector. The result of all these measurements can be written as \(\textbf{y} = \textbf{Fs}\), where \(\textbf{F}\) is a \(M \times N\) matrix.</description></item><item><title>Double descent</title><link>https://hugocisneros.com/notes/double_descent/</link><pubDate>Mon, 07 Mar 2022 15:59:00 +0100</pubDate><guid>https://hugocisneros.com/notes/double_descent/</guid><description>tags Neural network training, Neural networks resources (Belkin et al. 2019) Double descent is a phenomenon usually observed in neural networks, where the usual bias-variance tradeoff seems to break down: test error keeps decreasing as we over-parametrize the network or add more training examples. This was observed for over-parametrized neural networks in (Geman et al. 1992).
An illustration from (caption is also adapted from the paper) (Belkin et al. 2019):</description></item><item><title>Bias-variance tradeoff</title><link>https://hugocisneros.com/notes/bias_variance_tradeoff/</link><pubDate>Mon, 07 Mar 2022 15:56:00 +0100</pubDate><guid>https://hugocisneros.com/notes/bias_variance_tradeoff/</guid><description> tags Machine learning, Statistics</description></item><item><title>Levenshtein automata</title><link>https://hugocisneros.com/notes/levenshtein_automata/</link><pubDate>Mon, 07 Mar 2022 09:17:00 +0100</pubDate><guid>https://hugocisneros.com/notes/levenshtein_automata/</guid><description>tags Algorithm, Finite state machines resources Nick&amp;rsquo;s blog This is an algorithm used to find strings within a given Levenshtein distance of a target word.</description></item><item><title>Levenshtein distance</title><link>https://hugocisneros.com/notes/levenshtein_distance/</link><pubDate>Mon, 07 Mar 2022 09:15:00 +0100</pubDate><guid>https://hugocisneros.com/notes/levenshtein_distance/</guid><description> tags Algorithm, Natural language processing</description></item><item><title>NLP</title><link>https://hugocisneros.com/notes/nlp/</link><pubDate>Mon, 07 Mar 2022 09:15:00 +0100</pubDate><guid>https://hugocisneros.com/notes/nlp/</guid><description> tags Machine learning, Language NLP is about creating algorithms that can manipulate and use language. It is often thought that having functioning NLP algorithms that provably &amp;ldquo;understand&amp;rdquo; language would be equivalent to reaching human-level Artificial Intelligence.
Tasks Language modeling Text classification Question answering Data manipulation There are several ways to encode text data.
One-hot encoding of characters One-hot encoding of words Byte-pair encoding which can be seen as being a compromise between the two</description></item><item><title>Finite state machines</title><link>https://hugocisneros.com/notes/finite_state_machines/</link><pubDate>Mon, 07 Mar 2022 09:05:00 +0100</pubDate><guid>https://hugocisneros.com/notes/finite_state_machines/</guid><description> tags Theory of computation Finite automata Finite state transducers Implementations States machines in Rust: link. In Python: python-statemachine</description></item><item><title>Notes on: Transformer Memory as a Differentiable Search Index by Tay, Y., Tran, V. Q., Dehghani, M., Ni, J., Bahri, D., Mehta, H., Qin, Z., … (2022)</title><link>https://hugocisneros.com/notes/taytransformermemorydifferentiable2022/</link><pubDate>Thu, 17 Feb 2022 15:46:00 +0100</pubDate><guid>https://hugocisneros.com/notes/taytransformermemorydifferentiable2022/</guid><description> source (Tay et al. 2022) TODO Summary TODO Comments Bibliography Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, et al.. February 16, 2022. "Transformer Memory as a Differentiable Search Index". http://arxiv.org/abs/2202.06991.</description></item><item><title>Notes on: One model for the learning of language by Yang, Y., &amp; Piantadosi, S. T. (2022)</title><link>https://hugocisneros.com/notes/yangonemodellearning2022/</link><pubDate>Mon, 31 Jan 2022 13:55:00 +0100</pubDate><guid>https://hugocisneros.com/notes/yangonemodellearning2022/</guid><description>source (Yang, Piantadosi 2022) tags NLP, Artificial Intelligence, Machine learning DONE Summary This paper introduces a model for learning language from few examples while generalizing effectively.
This model builds sentences with function that are combinations of elementary functions, including:
pair(L, C) : Concatenates character C onto list L first(L) : Return the first character of L flip(P) : Return true with probability P if(B, X, Y) : Return X if B else return Y (X and Y may be lists, sets, or probabilities) etc.</description></item><item><title>Notes on: Next Generation Reservoir Computing by Gauthier, D. J., Bollt, E., Griffith, A., &amp; Barbosa, W. A. S. (2021)</title><link>https://hugocisneros.com/notes/gauthiernextgenerationreservoir2021/</link><pubDate>Sat, 29 Jan 2022 14:51:00 +0100</pubDate><guid>https://hugocisneros.com/notes/gauthiernextgenerationreservoir2021/</guid><description>source (Gauthier et al. 2021) tags Reservoir computing DONE Summary This paper bases itself on the demonstration that some reservoir computers (echo-state networks) are mathematically identical to nonlinear vector autoregression (NVAR) machines (Bollt 2021). A NVAR is just a regression over a feature vector composed of \(k\) time-delay observations of the dynamical system to be learned and nonlinear functions of these observations.
The authors introduce Next-Generation Reservoir computing (NG-RC) which is essentially a NVAR.</description></item><item><title>Notes on: Formal Definitions of Unbounded Evolution and Innovation Reveal Universal Mechanisms for Open-Ended Evolution in Dynamical Systems by Adams, A., Zenil, H., Davies, P. C. W., &amp; Walker, S. I. (2017)</title><link>https://hugocisneros.com/notes/adamsformaldefinitionsunbounded2017/</link><pubDate>Thu, 27 Jan 2022 12:05:00 +0100</pubDate><guid>https://hugocisneros.com/notes/adamsformaldefinitionsunbounded2017/</guid><description>source (Adams et al. 2017) tags Open-ended Evolution, Cellular automata DONE Summary This paper defines two properties for dynamical systems which are claimed to be related to open-ended evolution: Unbounded evolution (UE) and Innovation (INN). The combination of these two properties makes a system open-ended according to this paper&amp;rsquo;s definition For such properties to be possible, a system has to be decomposed into two entities that interact with each other:</description></item><item><title>Moore-Penrose inverse</title><link>https://hugocisneros.com/notes/moore_penrose_inverse/</link><pubDate>Thu, 27 Jan 2022 09:53:00 +0100</pubDate><guid>https://hugocisneros.com/notes/moore_penrose_inverse/</guid><description>tags Mathematics The MP inverse exists and is unique for any matrix \(A\). When \(A\) has linearly independent columns, the MP inverse \(A^+\) is \[ A^+ = (A^* A)^{-1 } A^* \]
Where \(A^*\) is the conjugate transpose of \(A\).
If the rows are linearly independent, \[ A^+ = A^* (AA^*)^{-1} \].</description></item><item><title>Jethro Kuan</title><link>https://hugocisneros.com/notes/jethro_kuan/</link><pubDate>Wed, 26 Jan 2022 17:56:00 +0100</pubDate><guid>https://hugocisneros.com/notes/jethro_kuan/</guid><description>website Personal website Jethro Kuan is a developer from Singapore. He developed Org-roam.</description></item><item><title>Generative adversarial networks</title><link>https://hugocisneros.com/notes/generative_adversarial_networks/</link><pubDate>Wed, 19 Jan 2022 12:14:00 +0100</pubDate><guid>https://hugocisneros.com/notes/generative_adversarial_networks/</guid><description>tags Neural networks, Generative modelling Generative adversarial networks are a type of generative model. It is close in spirit to Variational autoencoders, but has key differences. The main one is the way the model is trained, which uses an adversarial equilibrium between training a generator and training a discriminator.
Are GANs glorified PCA? (Richardson, Weiss 2020) This paper seems to show that image-to-image translation models are ill-posed and imply the image transformation should always be very local.</description></item><item><title>ALife Conference</title><link>https://hugocisneros.com/notes/alife_conference/</link><pubDate>Wed, 19 Jan 2022 12:13:00 +0100</pubDate><guid>https://hugocisneros.com/notes/alife_conference/</guid><description> tags Artificial life The ALife conference is about artificial life in its many forms. This includes topics like cellular automata, complex Systems, biological life, etc.
ALife 2020</description></item><item><title>Coding</title><link>https://hugocisneros.com/notes/coding/</link><pubDate>Wed, 19 Jan 2022 12:11:00 +0100</pubDate><guid>https://hugocisneros.com/notes/coding/</guid><description>tags Computer science, Programming languages For compiled programming languages, code has to be translated to machine code through a process called compilation</description></item><item><title>Graph cellular automata</title><link>https://hugocisneros.com/notes/graph_cellular_automata/</link><pubDate>Mon, 17 Jan 2022 17:08:00 +0100</pubDate><guid>https://hugocisneros.com/notes/graph_cellular_automata/</guid><description>tags Cellular automata, Graphs This concept was mentioned in (O&amp;rsquo;Sullivan 2001), although it may not be the first ever mention of it.
The idea is also similar to graph convolutional networks and other graph neural networks, where the goal is to construct an update function for a node that doesn&amp;rsquo;t depend on the number of neighboring nodes. This enables running cellular automata on non-grid structures.
This idea was recently published at NeurIPS 2021 (Grattarola et al.</description></item><item><title>Cellular neural networks</title><link>https://hugocisneros.com/notes/cellular_neural_networks/</link><pubDate>Mon, 17 Jan 2022 16:44:00 +0100</pubDate><guid>https://hugocisneros.com/notes/cellular_neural_networks/</guid><description> tags Cellular automata, Neural networks resources Scholarpedia, (Chua, Yang 1988; Chua, Yang 1988)
Bibliography L.O. Chua, L. Yang. October 1988. "Cellular Neural Networks: Applications". IEEE Transactions on Circuits and Systems 35 (10):1273–90. DOI. L.O. Chua, L. Yang. October 1988. "Cellular Neural Networks: Theory". IEEE Transactions on Circuits and Systems 35 (10):1257–72. DOI.</description></item><item><title>Intrinsic motivation</title><link>https://hugocisneros.com/notes/intrinsic_motivation/</link><pubDate>Mon, 17 Jan 2022 13:44:00 +0100</pubDate><guid>https://hugocisneros.com/notes/intrinsic_motivation/</guid><description>tags Reinforcement learning, Robotics references :
According to (Ryan, Deci 2000) (pp. 56),
Intrinsic motivation is defined as the doing of an activity for its inherent satisfaction rather than for some separable consequence. When intrinsically motivated, a person is moved to act for the fun or challenge entailed rather than because of external products, pressures, or rewards.
It is defined by contrast with extrinsic motivation
Extrinsic motivation is a construct that pertains whenever an activity is done in order to attain some separable outcome.</description></item><item><title>Supervised learning</title><link>https://hugocisneros.com/notes/supervised_learning/</link><pubDate>Thu, 04 Nov 2021 14:23:00 +0100</pubDate><guid>https://hugocisneros.com/notes/supervised_learning/</guid><description>tags Machine learning Data Input/output example pairs: \[ \{(x_i, y_i)\}_{i\leq n} \sim_{iid} \mathbb{P}, \quad \mathbb{P} \in \mathcal{P}(\mathcal{X} \times \mathcal{Y}) \text{ unknown} \]
Mapping We search for a mapping \(f: \mathcal{X} \rightarrow \mathcal{Y}\). It is also common to parameterize this mapping with a parameter \(\theta \in \mathbb{R}^d\) and write \(h: \mathcal{X} \times \mathbb{R}^d \rightarrow \mathcal{Y}\).
The prediction \(\hat{y}\) is written
\[ \hat{y} = f(x) = h(x, \theta) \]
Objective The goal is to find the above mapping such as to minimize an objective.</description></item><item><title>Catastrophic forgetting</title><link>https://hugocisneros.com/notes/catastrophic_forgetting/</link><pubDate>Mon, 18 Oct 2021 09:53:00 +0200</pubDate><guid>https://hugocisneros.com/notes/catastrophic_forgetting/</guid><description>tags Machine learning Catastrophic forgetting is the name given to a common problem of machine learning models: when training on some new data from a new distribution (a new &amp;ldquo;task&amp;rdquo;), many models forget what they learned from the first task.
This isn&amp;rsquo;t surprising since models are following a loss function that is often applied solely on the task at hand, and not constraining the model to retain past information.</description></item><item><title>Conway's Game of Life</title><link>https://hugocisneros.com/notes/conway_s_game_of_life/</link><pubDate>Mon, 18 Oct 2021 08:56:00 +0200</pubDate><guid>https://hugocisneros.com/notes/conway_s_game_of_life/</guid><description>tags Cellular automata resources (Gardner 1970) It is one of the most famous Cellular automata rule, invented as a game by mathematician John Conway.
Learn the game of life with a neural network This paper investigates how hard it is for neural networks to approximate the Game of Life rule (Springer, Kenyon 2020).
Bibliography Martin Gardner. October 1970. "Mathematical Games". Scientific American 223 (4):120–23. DOI. Jacob M. Springer, Garrett T.</description></item><item><title>Variational autoencoders</title><link>https://hugocisneros.com/notes/variational_autoencoders/</link><pubDate>Thu, 07 Oct 2021 13:37:00 +0200</pubDate><guid>https://hugocisneros.com/notes/variational_autoencoders/</guid><description>tags Neural networks resources (Bishop 1994) Variational autoencoders (VAEs) are a type of generative Autoencoders.
They use a Bayesian latent encoding for the input dataset.
VAEs vs. GANs VAEs have fallen out of fashion when GANs became popular, because they were able to get visually interesting results more easily. However, some works a few years later seem to show that they have similar potential (Vahdat, Kautz 2020).
Bibliography Christopher M.</description></item><item><title>Compression</title><link>https://hugocisneros.com/notes/compression/</link><pubDate>Tue, 05 Oct 2021 17:52:00 +0200</pubDate><guid>https://hugocisneros.com/notes/compression/</guid><description>Compression with Neural networks Compression can be done with the help of neural networks as estimators of the sequence&amp;rsquo;s next character probability (Schmidhuber, Heil 1996).
Compression as a measure of Artificial Intelligence Mahoney argues in (Mahoney 1999) that being able to compress information amounts to being able to predict optimally the distribution of an inputs natural language corpus. A good compression algorithm &amp;ldquo;learns&amp;rdquo; features of the language to make better predictions.</description></item><item><title>Graph compression</title><link>https://hugocisneros.com/notes/graph_compression/</link><pubDate>Thu, 30 Sep 2021 14:47:00 +0200</pubDate><guid>https://hugocisneros.com/notes/graph_compression/</guid><description> tags Compression, Graphs (Bouritsas et al. 2021)
Bibliography Giorgos Bouritsas, Andreas Loukas, Nikolaos Karalias, Michael M. Bronstein. July 5, 2021. "Partition and Code: Learning How to Compress Graphs". http://arxiv.org/abs/2107.01952.</description></item><item><title>Neural networks as dynamical systems</title><link>https://hugocisneros.com/notes/neural_networks_as_dynamical_systems/</link><pubDate>Thu, 30 Sep 2021 14:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/neural_networks_as_dynamical_systems/</guid><description>tags Neural networks, Dynamical systems Neural networks can be seen as dynamical systems in different contexts.
Recurrent networks With Recurrent neural networks, the continuous dynamical system analogy is very striking. These networks evolve progressively in time by updating an internal state with a fixed algorithm. Usually the state dynamics are not studied because the recurrent networks is designed to complete some fixed task.
The notion of attractor can be defined for such networks, making them related to the notion of attractor networks.</description></item><item><title>Learning in dynamical systems</title><link>https://hugocisneros.com/notes/learning_in_dynamical_systems/</link><pubDate>Thu, 30 Sep 2021 12:56:00 +0200</pubDate><guid>https://hugocisneros.com/notes/learning_in_dynamical_systems/</guid><description> tags Machine learning, Dynamical systems resources (Weinan 2017) Bibliography E Weinan. March 2017. "A Proposal on Machine Learning via Dynamical Systems". Communications in Mathematics and Statistics 5 (1):1–11. DOI.</description></item><item><title>Tropical semiring</title><link>https://hugocisneros.com/notes/tropical_semiring/</link><pubDate>Thu, 30 Sep 2021 12:36:00 +0200</pubDate><guid>https://hugocisneros.com/notes/tropical_semiring/</guid><description>tags Mathematics Definition The min tropical semiring is the semiring \((\mathbb{R} \cup \{ +\infty \}, \oplus, \otimes )\) with the operations:
\(x \oplus y = \min(x, y)\) \(x \otimes y = x + y\) The unit for \(\oplus\) is \(+\infty\) and the unit for \(\otimes\) is \(0\). The max tropical semiring is defined similarly by replacing \(\min\) with \(\max\).
Relation with shortest path algorithms There is an interesting connection between the min tropical semiring and Dijkstra&amp;rsquo;s algorithm.</description></item><item><title>Arithmetic coding</title><link>https://hugocisneros.com/notes/arithmetic_coding/</link><pubDate>Thu, 23 Sep 2021 09:07:00 +0200</pubDate><guid>https://hugocisneros.com/notes/arithmetic_coding/</guid><description> tags Compression, Entropy coding resources (Witten et al. 1987) Bibliography Ian H. Witten, Radford M. Neal, John G. Cleary. 1987. "Arithmetic Coding for Data Compression". Communications of the ACM 30 (6). ACM New York, NY, USA:520–40.</description></item><item><title>Blue noise</title><link>https://hugocisneros.com/notes/blue_noise/</link><pubDate>Thu, 23 Sep 2021 09:07:00 +0200</pubDate><guid>https://hugocisneros.com/notes/blue_noise/</guid><description> tags Noise resources (Wong, Wong 2017) Bibliography Kin-Ming Wong, Tien-Tsin Wong. June 1, 2017. "Blue Noise Sampling Using an N-body Simulation-Based Method". The Visual Computer 33 (6):823–32. DOI.</description></item><item><title>Entropy coding</title><link>https://hugocisneros.com/notes/entropy_coding/</link><pubDate>Thu, 23 Sep 2021 08:50:00 +0200</pubDate><guid>https://hugocisneros.com/notes/entropy_coding/</guid><description> tags Compression, Entropy</description></item><item><title>Poincaré recurrence time</title><link>https://hugocisneros.com/notes/poincare_recurrence_time/</link><pubDate>Tue, 14 Sep 2021 22:50:00 +0200</pubDate><guid>https://hugocisneros.com/notes/poincare_recurrence_time/</guid><description>tags Dynamical systems The Poincaré recurrence time for a finite dynamical system is the maximal theoretical time after which the system will return to its initial state and the trajectory will repeat.
In the case of a cellular automaton on a grid of size \(n\) with \(k\) possible states per cell, the recurrence time is \(t_P = k^n\).</description></item><item><title>Monero</title><link>https://hugocisneros.com/notes/monero/</link><pubDate>Tue, 14 Sep 2021 22:05:00 +0200</pubDate><guid>https://hugocisneros.com/notes/monero/</guid><description> tags Cryptography resources Monero</description></item><item><title>Ring signatures</title><link>https://hugocisneros.com/notes/ring_signatures/</link><pubDate>Tue, 14 Sep 2021 22:01:00 +0200</pubDate><guid>https://hugocisneros.com/notes/ring_signatures/</guid><description>tags Cryptography A ring signature is a protocol that allows a single entity from a group or public-private key pairs to sign a message in such a way that anyone can check the signature was indeed made by someone from that group but it is impossible to tell who exactly. It should also be very hard to create a fake signature without knowing any of the private keys from the group.</description></item><item><title>Cellular automata as recurrent neural networks</title><link>https://hugocisneros.com/notes/cellular_automata_as_recurrent_neural_networks/</link><pubDate>Mon, 06 Sep 2021 16:06:00 +0200</pubDate><guid>https://hugocisneros.com/notes/cellular_automata_as_recurrent_neural_networks/</guid><description>tags Cellular automata, Recurrent neural networks Since cellular automata are a kind of dynamical system, they may also be seen as a type of recurrent neural network. They can even be seen as a recurrent-convolutional network because each of the &amp;ldquo;hidden neurons&amp;rdquo; update depends only on the neighboring neurons and the update rule is shared across the whole hidden state.</description></item><item><title>Reductionism</title><link>https://hugocisneros.com/notes/reductionism/</link><pubDate>Fri, 03 Sep 2021 15:14:00 +0200</pubDate><guid>https://hugocisneros.com/notes/reductionism/</guid><description>tags Philosophy, Physics Reductionism is a philosophy according to which the laws of physics are relatively simple and could be expressed concisely. If we were to understand and model all those laws we could explain any given phenomenon by breaking it down into its smaller parts until we just need to apply these simple laws.
With the current state of physics, it seems we are close to understanding many of these fundamental laws (although major shortcomings remain in the theory).</description></item><item><title>Notes on: Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention by Xiong, Y., Zeng, Z., Chakraborty, R., Tan, M., Fung, G., Li, Y., &amp; Singh, V. (2021)</title><link>https://hugocisneros.com/notes/xiongnystromformernystr2021/</link><pubDate>Thu, 02 Sep 2021 12:52:00 +0200</pubDate><guid>https://hugocisneros.com/notes/xiongnystromformernystr2021/</guid><description>tags Transformers source (Xiong et al. 2021) TODO Summary This paper describes a way of applying the Nyström method for approximating matrix multiplication to transformers. More precisely, the approximation is used in the self-attention mechanism&amp;rsquo;s softmax calculation.
This approximation adresses one of the biggest downside of attention: its computational complexity. The authors claim that their method reduces it from \(O(n^2)\) to \(O(n)\).
The goal of the method is to efficiently approximate the matrix</description></item><item><title>Graph neural networks</title><link>https://hugocisneros.com/notes/graph_neural_networks/</link><pubDate>Thu, 02 Sep 2021 12:49:00 +0200</pubDate><guid>https://hugocisneros.com/notes/graph_neural_networks/</guid><description>tags Neural networks, Graphs Basic properties To operate on graphs, a neural network must be invariant to isomorphism of these graphs. This translates to permutation invariance for the nodes of a graph.
\[ f(\mathbf{PX}) = f(\mathbf{X}) \]
Where \(\mathbf{P}\) is a permutation matrix. For simple sets, this amounts to performing node-wise transformations and use a permutation invariant aggregator (sum/max/avg/&amp;hellip;). This was done in (Zaheer et al. 2018).
\[ f(\mathbf{X}) = \phi\left( \bigoplus_i \psi(\mathbf{x}_i) \right) \]</description></item><item><title>Notes on: The information theory of individuality by Krakauer, D., Bertschinger, N., Olbrich, E., Flack, J. C., &amp; Ay, N. (2020)</title><link>https://hugocisneros.com/notes/krakauerinformationtheoryindividuality2020/</link><pubDate>Thu, 02 Sep 2021 12:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/krakauerinformationtheoryindividuality2020/</guid><description>tags Information theory, Life source (Krakauer et al. 2020) Summary This paper introduces an information theoretic definition of individuality for complex systems.
In a few words, the authors idea of individuality is based on the amount of information transmitted through time.
If the information transmitted forward in time is close to maximal, we take that as evidence for individuality.
Formally, a system \(\mathcal{S}\) is considered in interaction with an environment \(\mathcal{E}\).</description></item><item><title>Notes on: Growing Neural Cellular Automata by Mordvintsev, A., Randazzo, E., Niklasson, E., &amp; Levin, M. (2020)</title><link>https://hugocisneros.com/notes/mordvintsevgrowingneuralcellular2020/</link><pubDate>Wed, 01 Sep 2021 17:31:00 +0200</pubDate><guid>https://hugocisneros.com/notes/mordvintsevgrowingneuralcellular2020/</guid><description>tags Cellular automata source (Mordvintsev et al. 2020) Summary This paper introduces interesting ideas for training cellular automata as CNNs to have self-repairing stable structures. The automata have 16 dimensional continuous states. The main modeling ideas are:
Use hard-coded filters for the initial perception step. The filters are Sobel convolutions and those two are concatenated with the current state. Update rules are then 1D convolutions applied to the \(3 * 16 = 48\) dimensional state vector.</description></item><item><title>Cellular automata as convolutional neural networks</title><link>https://hugocisneros.com/notes/cellular_automata_as_cnns/</link><pubDate>Wed, 01 Sep 2021 17:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/cellular_automata_as_cnns/</guid><description>tags Cellular automata, Convolutional neural networks Motivation Cellular automata are computational models based on several principles such as translation-invariance and parallelism of computations. These principles also motivated the creation of Convolutional neural networks &amp;mdash; used initially for images and text &amp;mdash;, making this model well-suited to reason about cellular automata.
There is indeed a deep connection between the two models, making it seem like there are two expressions of the same idea: spatially organized information can be processed locally in parallel.</description></item><item><title>Computing in cellular automata</title><link>https://hugocisneros.com/notes/computing_in_cellular_automata/</link><pubDate>Wed, 01 Sep 2021 17:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/computing_in_cellular_automata/</guid><description>tags Unconventional computing, Cellular automata resources (Mitchell 2005; Wolfram 2002) Cellular automata are computational models capable of interesting emergent behavior. A major challenge is to understand which CA rules are doing useful or efficient computations. It is not clear how these systems could be programmed or made to compute a particular function.
Hand-engineered CA rules Below images show CA rules that can compute non trivial functions (Images are from (Wolfram 2002), see A new kind of science online ).</description></item><item><title>Epsilon machines</title><link>https://hugocisneros.com/notes/epsilon_machines/</link><pubDate>Wed, 01 Sep 2021 17:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/epsilon_machines/</guid><description>(Crutchfield, Young 1989)
Bibliography James P. Crutchfield, Karl Young. July 10, 1989. "Inferring Statistical Complexity". Physical Review Letters 63 (2):105–8. DOI.</description></item><item><title>Style transfer</title><link>https://hugocisneros.com/notes/style_transfer/</link><pubDate>Wed, 01 Sep 2021 17:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/style_transfer/</guid><description>tags Computer vision Style transfer is the process of transferring some visual features from one image to another image while preserving the latter&amp;rsquo;s content information. Since both these notions may be considered subjective, the problem of style transfer is not well defined and may be approached in many ways.
Style transfer with CNNs This is an early example of style transfer with convolutional neural networks: (Gatys et al. 2016)</description></item><item><title>Notes on: Reservoir Computing meets Recurrent Kernels and Structured Transforms by Dong, J., Ohana, R., Rafayelyan, M., &amp; Krzakala, F. (2020)</title><link>https://hugocisneros.com/notes/dongreservoircomputingmeets2020/</link><pubDate>Mon, 30 Aug 2021 22:02:00 +0200</pubDate><guid>https://hugocisneros.com/notes/dongreservoircomputingmeets2020/</guid><description>source (Dong et al. 2020) tags Reservoir computing, Kernel Methods DONE Summary This paper presents a connection between large size reservoir computing and kernel methods.
The authors formulate a reservoir computing model as a form of recurrent kernel iteration. If the reservoir update is written \[ x^{(t+1)} = \dfrac{1}{\sqrt{N}} f \left(W_r x^{(t)} + W_i i^{(t)} \right) \] with \(x^{(t)}\) the state of the reservoir at time \(t\) and \(i^{(t)}\) sequential input at time \(t\), \(W_r \in \mathbb{R}^{N\times N}\) and \(W_i \in \mathbb{R}^{N\times d}\), we may re-frame it as a random feature embedding of the vector \(\left[ x^{(t)} , i^{(t)} \right]\) with the matrix \(W = [W_r, W_i]\).</description></item><item><title>Differential equations</title><link>https://hugocisneros.com/notes/differential_equations/</link><pubDate>Mon, 30 Aug 2021 17:12:00 +0200</pubDate><guid>https://hugocisneros.com/notes/differential_equations/</guid><description> tags Mathematics</description></item><item><title>Softmax</title><link>https://hugocisneros.com/notes/softmax/</link><pubDate>Fri, 27 Aug 2021 15:41:00 +0200</pubDate><guid>https://hugocisneros.com/notes/softmax/</guid><description> tags Applied maths The Softmax can refer to two mathematical functions:
In machine learning a softmax is the function which normalizes a vector of values to a probability vector: \(\text{softmax}(\mathbf{x}) = \dfrac{e^{\mathbf{x}}}{\sum_i e^{x_i}}\) where \(\mathbf{x} = (x_i) \in \mathbb{R}^n\). This function could also be called soft-argmax because it is a smooth approximation of the discrete argmax function. It may also refer to a smoothed maximum function like \(\epsilon \log \sum_i \exp (x_i / \epsilon)\) which approximates the \(\text{max}\) function in the limit \(\epsilon \rightarrow 0\)</description></item><item><title>Notes on: Learned Initializations for Optimizing Coordinate-Based Neural Representations by Tancik, M., Mildenhall, B., Wang, T., Schmidt, D., Srinivasan, P. P., Barron, J. T., &amp; Ng, R. (2021)</title><link>https://hugocisneros.com/notes/tanciklearnedinitializationsoptimizing2021/</link><pubDate>Wed, 25 Aug 2021 16:39:00 +0200</pubDate><guid>https://hugocisneros.com/notes/tanciklearnedinitializationsoptimizing2021/</guid><description>tags Neural radiance fields, Meta-learning source (Tancik et al. 2021) DONE Summary This paper explores meta-learning techniques for improving the quality and speed of convergence of learned implicit neural representations.
The authors use meta-learning to optimize the initial weights \(\theta_0\) of the neural networks such that it minimizes the loss \(L(\theta_m)\) when the network is optimized on a new unseen observations.
As a meta-learning problem, there is an inner loop and an outer loop:</description></item><item><title>Meta-learning</title><link>https://hugocisneros.com/notes/meta_learning/</link><pubDate>Wed, 25 Aug 2021 16:00:00 +0200</pubDate><guid>https://hugocisneros.com/notes/meta_learning/</guid><description>tags Machine learning Constrained meta-learning (Kirsch, Schmidhuber 2021)
Meta-learning of initialization The goal is to learn the initialization of neural network parameters or recurrent neural network initial states in order to make the training faster or less prone to getting stuck in local minima.
Example for implicit neural representations: (Tancik et al. 2021)
Meta-learning algorithms MAML (Finn et al. 2017)
Reptile (Nichol et al. 2018)
Bibliography Louis Kirsch, Jürgen Schmidhuber.</description></item><item><title>Notes on: Pretrained Transformers as Universal Computation Engines by Lu, K., Grover, A., Abbeel, P., &amp; Mordatch, I. (2021)</title><link>https://hugocisneros.com/notes/lupretrainedtransformersuniversal2021/</link><pubDate>Wed, 25 Aug 2021 15:19:00 +0200</pubDate><guid>https://hugocisneros.com/notes/lupretrainedtransformersuniversal2021/</guid><description>source (Lu et al. 2021) tags Transformers DONE Summary Different types of neural network architecture encode different kinds of biases. For example, convolutional neural networks perform local, translation-invariant operations and recurrent neural networks operate on sequential data.
One can use these biases in randomly initialized networks as a basis for interesting computations. This is on of the motivation for reservoir computing with echo-state networks, which uses fixed random recurrent neural network and a simple trainable linear transformation to perform complex computations.</description></item><item><title>Notes on: Thinking Like Transformers by Weiss, G., Goldberg, Y., &amp; Yahav, E. (2021)</title><link>https://hugocisneros.com/notes/weissthinkingtransformers2021/</link><pubDate>Wed, 25 Aug 2021 15:19:00 +0200</pubDate><guid>https://hugocisneros.com/notes/weissthinkingtransformers2021/</guid><description>tags NLP, Computer science source (Weiss et al. 2021) DONE Summary This paper introduces a programming language that is inspired by the way Transformers process input data. The language is called Restricted Access Sequence Processing Language (RASP).
Data is represented as sequences, which is the structure transformers manipulate (since they have been designed for NLP applications). The language has two types of internal data representation:
Sequence operators (s-ops) are functions that translate sequences into sequences.</description></item><item><title>System of linear equations</title><link>https://hugocisneros.com/notes/system_of_linear_equations/</link><pubDate>Tue, 24 Aug 2021 15:04:00 +0200</pubDate><guid>https://hugocisneros.com/notes/system_of_linear_equations/</guid><description>tags Applied maths Such a system with \(m\) equations and \(n\) unknowns is often denoted \(Ax = b\) where \(A\) is a matrix \(m\times n\) and \(b\) is a vector of size \(m\).
There are multiple methods to solve such a systems with different sets of hypotheses.
System types Square matrix with full rank In the most simple case: a square matrix with full rank, the solution exists and is unique: \(x = A^{-1} b\)</description></item><item><title>Knuth-Morris-Pratt string-searching algorithm</title><link>https://hugocisneros.com/notes/knuth_morris_pratt_string_searching_algorithm/</link><pubDate>Tue, 24 Aug 2021 11:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/knuth_morris_pratt_string_searching_algorithm/</guid><description> tags Algorithm, Computer science resources Yurichev.com</description></item><item><title>Neural cellular automata and implicit representations</title><link>https://hugocisneros.com/notes/neural_cellular_automata_and_implicit_representations/</link><pubDate>Fri, 20 Aug 2021 15:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/neural_cellular_automata_and_implicit_representations/</guid><description> tags Cellular automata, Neural cellular automata</description></item><item><title>Notes on: Emergence in artificial life by Gershenson, C. (2021)</title><link>https://hugocisneros.com/notes/gershensonemergenceartificiallife2021/</link><pubDate>Tue, 03 Aug 2021 10:45:00 +0200</pubDate><guid>https://hugocisneros.com/notes/gershensonemergenceartificiallife2021/</guid><description>tags Artificial life, Emergence source (Gershenson 2021) DONE Summary The paper introduces a complexity metric based on information. emergence is first measured with Shannon&amp;rsquo;s information: \[E = - K \sum_{i} p_i \log p_i\]
Then the author argues that self-organization can be seen as the opposite of emergence, and measured with \[S = 1 - E\]
[&amp;hellip;] complex systems tend to exhibit both emergence and self-organization. Extreme emergence implies chaos, while extreme self-organization implies immutability.</description></item><item><title>Locality-Sensitive Hashing</title><link>https://hugocisneros.com/notes/locality_sensitive_hashing/</link><pubDate>Thu, 24 Jun 2021 08:45:00 +0200</pubDate><guid>https://hugocisneros.com/notes/locality_sensitive_hashing/</guid><description> tags Computer science resources Tyler Neylon&amp;rsquo;s blog</description></item><item><title>Zuse's thesis</title><link>https://hugocisneros.com/notes/zuse_s_thesis/</link><pubDate>Tue, 15 Jun 2021 09:58:00 +0200</pubDate><guid>https://hugocisneros.com/notes/zuse_s_thesis/</guid><description>tags Physics, Philosophy resources Juergen Schmidhuber&amp;rsquo;s page, (Schmidhuber 1999) Zuse&amp;rsquo;s thesis is the idea that the Universe could be running within a digital computer. It was formulated by Konrad Zuse in Rechnender Raum (Calculating Space) in 1969. The computer could be a very large Cellular automaton according to Zuse.
A computer program to simulate our Universe (and all the others) Systematically create and execute all programs for a universal computer, such as a Turing machine or a CA; the first program is run for one instruction every second step on average, the next for one instruction every second of the remaining steps on average, and so on.</description></item><item><title>Distillation</title><link>https://hugocisneros.com/notes/distillation/</link><pubDate>Mon, 14 Jun 2021 11:54:00 +0200</pubDate><guid>https://hugocisneros.com/notes/distillation/</guid><description>tags Machine learning, Neural networks, Transfer learning Distillation is used to describe the process of transferring performances from a large trained teacher neural network to a untrained student network.
Instead of training the target network to score best according the task&amp;rsquo;s loss function, distillation optimizes for the target network to match the output distribution or neuron activation patterns of the teacher network.
A review: (Beyer et al. 2021).
Bibliography Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov.</description></item><item><title>Transfer learning</title><link>https://hugocisneros.com/notes/transfer_learning/</link><pubDate>Mon, 14 Jun 2021 11:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/transfer_learning/</guid><description> tags Machine learning</description></item><item><title>Automated theorem proving</title><link>https://hugocisneros.com/notes/automated_theorem_proving/</link><pubDate>Mon, 14 Jun 2021 11:25:00 +0200</pubDate><guid>https://hugocisneros.com/notes/automated_theorem_proving/</guid><description> tags Mathematics Machine learning for theorem proving</description></item><item><title>Genetic algorithms</title><link>https://hugocisneros.com/notes/genetic_algorithm/</link><pubDate>Mon, 14 Jun 2021 10:23:00 +0200</pubDate><guid>https://hugocisneros.com/notes/genetic_algorithm/</guid><description>Genetic algorithms can be used as optimization algorithms for search problems, where usual optimization techniques such as gradient-based ones aren&amp;rsquo;t very effective.
These methods are loosely based on evolution in biological life, implementing a limited form variation and selection to progress towards better fitness (measured by a specific fitness function).
New candidate solutions for a problem are constructed by randomly combining and mutating parent solutions. The best candidate are kept and become parents of the next generation.</description></item><item><title>Alternative learning mechanisms</title><link>https://hugocisneros.com/notes/alternative_learning_mechanisms/</link><pubDate>Mon, 14 Jun 2021 10:06:00 +0200</pubDate><guid>https://hugocisneros.com/notes/alternative_learning_mechanisms/</guid><description>tags Machine learning Many people, including Geoffrey Hinton, have raised concerns about the back-propagation algorithm and the fact that it&amp;rsquo;s likely not a promising way to achieve Artificial Intelligence (see this Axios blog post).
Alternative mechanisms for learning have been and are currently studied to try and approach the learning problem in a more effective way.
Direct feedback alignment (Nøkland 2016)
Hebbian learning The theory is sometimes summarized as &amp;ldquo;Cells that fire together wire together.</description></item><item><title>Roam research</title><link>https://hugocisneros.com/notes/roam_research/</link><pubDate>Tue, 18 May 2021 15:03:00 +0200</pubDate><guid>https://hugocisneros.com/notes/roam_research/</guid><description> tags Writing</description></item><item><title>Org-mode</title><link>https://hugocisneros.com/notes/org_mode/</link><pubDate>Tue, 18 May 2021 15:00:00 +0200</pubDate><guid>https://hugocisneros.com/notes/org_mode/</guid><description>tags Emacs, Writing Org is a markup language similar to Markdown. It was designed to be used in the Emacs editor, which offers special features for working with files in the Org format.
Org mode can be used as an agenda, task manager, writing and publishing tool, and many other things. Extensions in the form of Emacs package offer even more features to make org-mode more powerful.</description></item><item><title>Church-Turing thesis</title><link>https://hugocisneros.com/notes/church_turing_thesis/</link><pubDate>Sun, 16 May 2021 14:56:00 +0200</pubDate><guid>https://hugocisneros.com/notes/church_turing_thesis/</guid><description>tags Computability theory A function on the natural numbers can be computed effectively if and only if it can be computed by a Turing Machine (or any equivalent computational model).
Implications for Zuse&amp;rsquo;s thesis An interesting implication of the Church-Turing thesis is any Turing-complete computational model could in theory be &amp;ldquo;computing&amp;rdquo; our Universe. However, the constant overhead of running this algorithm is very different from one model to another. There must be an optimal or close to optimal computational model for simulating life processes and it seems from everyday observation that it should be inherently parallel.</description></item><item><title>Notes on: More Is Different by Anderson, P. W. (1972)</title><link>https://hugocisneros.com/notes/andersonmoredifferent1972/</link><pubDate>Sun, 16 May 2021 14:37:00 +0200</pubDate><guid>https://hugocisneros.com/notes/andersonmoredifferent1972/</guid><description>tags Complexity, Philosophy source (Anderson 1972) This is a fundamental paper discussing the fundamental laws of Physics and their relations with complexity.
Reductionism doesn&amp;rsquo;t imply constructionism It is generally accepted that the fundamental laws governing our Universe are relatively simple. We feel we understand many of these laws quite well. However, understanding these fundamental laws are far from enough to actually describe and reconstruct all phenomena we witness.
The main fallacy in this kind of thinking is that the reductionist hypothesis does not by any means imply a &amp;ldquo;constructionist&amp;rdquo; one: The ability to reduce everything to simple fundamental laws does not imply the ability to start from those laws and reconstruct the universe.</description></item><item><title>Emergence in artificial life</title><link>https://hugocisneros.com/notes/emergence_in_artificial_life/</link><pubDate>Sun, 16 May 2021 14:23:00 +0200</pubDate><guid>https://hugocisneros.com/notes/emergence_in_artificial_life/</guid><description> tags Artificial life, Emergence resources (Gershenson 2021) Bibliography Carlos Gershenson. April 30, 2021. "Emergence in Artificial Life". http://arxiv.org/abs/2105.03216.&amp;nbsp;See notes</description></item><item><title>Notes on: Hopfield Networks is All You Need by Ramsauer, H., Schäfl, B., Lehner, J., Seidl, P., Widrich, M., Gruber, L., Holzleitner, M., … (2020)</title><link>https://hugocisneros.com/notes/ramsauerhopfieldnetworksall2020/</link><pubDate>Wed, 05 May 2021 16:26:00 +0200</pubDate><guid>https://hugocisneros.com/notes/ramsauerhopfieldnetworksall2020/</guid><description> tags Hopfield Networks, Attention source (Ramsauer et al. 2020) resources Blog post TODO Summary This quote summarizes the paper well: &amp;ldquo;In order to integrate modern Hopfield networks into deep learning architectures, we have to make them continuous&amp;rdquo;.
TODO Comments Bibliography Hubert Ramsauer, Bernhard Schäfl, Johannes Lehner, Philipp Seidl, Michael Widrich, Lukas Gruber, Markus Holzleitner, et al.. July 16, 2020. "Hopfield Networks Is All You Need". http://arxiv.org/abs/2008.02217.</description></item><item><title>Lempel-Ziv-Welch algorithm</title><link>https://hugocisneros.com/notes/lempel_ziv_welch_algorithm/</link><pubDate>Tue, 04 May 2021 09:22:00 +0200</pubDate><guid>https://hugocisneros.com/notes/lempel_ziv_welch_algorithm/</guid><description>tags Compression, Complexity papers (Lempel, Ziv 1976; Ziv, Lempel 1977; Ziv, Lempel 1978; Welch 1984; Storer, Szymanski 1982) Context The LZW algorithm was originally designed as a complexity (&amp;ldquo;randomness&amp;rdquo;) metric for finite sequences (Lempel, Ziv 1976). It was then extended as a compression algorithm by the same authors to LZ77 (Ziv, Lempel 1977) and LZ78 (Ziv, Lempel 1978). Those last two are the basis of many well known and widely used compression utilities such as GIF, compress (LZW (Welch 1984) ) or DEFLATE, gzip (LZSS (Storer, Szymanski 1982)), etc.</description></item><item><title>Notes on: Generalization over different cellular automata rules learned by a deep feed-forward neural network by Aach, M., Goebbert, J. H., &amp; Jitsev, J. (2021)</title><link>https://hugocisneros.com/notes/aachgeneralizationdifferentcellular2021/</link><pubDate>Mon, 26 Apr 2021 20:53:00 +0200</pubDate><guid>https://hugocisneros.com/notes/aachgeneralizationdifferentcellular2021/</guid><description>source (Aach et al. 2021) tags Cellular automata, Neural networks DONE Summary This paper studies the generalization abilities of neural networks on tasks involving learning the dynamics of cellular automata rules from examples.
Neural networks are trained to predict the next state of a CA from the three previous timesteps. Different training examples for a single rule corresponds to different initialization.
The authors study three kinds of generalization:
Simple generalization: The network is trained on 300 different CA rules and tested on more unseen initial configurations from those 300 rules.</description></item><item><title>Generalization in Machine learning</title><link>https://hugocisneros.com/notes/generalization_in_machine_learning/</link><pubDate>Fri, 23 Apr 2021 11:26:00 +0200</pubDate><guid>https://hugocisneros.com/notes/generalization_in_machine_learning/</guid><description> tags Machine learning, Applied maths</description></item><item><title>Notes on: AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence by Clune, J. (2019)</title><link>https://hugocisneros.com/notes/cluneaigasaigeneratingalgorithms2019/</link><pubDate>Fri, 23 Apr 2021 11:13:00 +0200</pubDate><guid>https://hugocisneros.com/notes/cluneaigasaigeneratingalgorithms2019/</guid><description>tags Artificial Intelligence, Genetic algorithms, Open-ended Evolution source (Clune 2019) Summary Nowadays, the design of AI systems is approached through one main way (more or less): the implementation of some elementary building blocks &amp;mdash; like convolutions, skip connections, activation functions, attention, etc. We currently have no clear idea how to combine these relatively successful blocks into a global system that would take advantage of each and every one of them.</description></item><item><title>Matrix factorization</title><link>https://hugocisneros.com/notes/matrix_factorization/</link><pubDate>Wed, 21 Apr 2021 15:34:00 +0200</pubDate><guid>https://hugocisneros.com/notes/matrix_factorization/</guid><description>tags Mathematics LU Factorization resources Nick Higham&amp;rsquo;s blog An LU factorization of a \(n \times n\) matrix \(A\) is a factorization \(A = LU\), where \(L\) is lower triangular and \(U\) is upper triangular
LUP factorization LU factorization with partial pivoting: \(PA = LU\) with \(P\) a permutation matrix.
LDU factorization Lower-diagonal upper factorization: \(A = LDU\) with \(D\) a diagonal matrix and \(L\) and \(U\) are uni-triangular (triangular with diagonal one).</description></item><item><title>Attractor networks</title><link>https://hugocisneros.com/notes/attractor_networks/</link><pubDate>Mon, 19 Apr 2021 11:24:00 +0200</pubDate><guid>https://hugocisneros.com/notes/attractor_networks/</guid><description>tags Physics, Applied maths, Neural networks resources Scholarpedia Attractor networks are sets of nodes connected in such a way that their dynamics are stable in a small subspace of their phase space. The network state usually resides on this smaller manifold after a few evolution steps.
These networks are often recurrent.</description></item><item><title>Boolean networks</title><link>https://hugocisneros.com/notes/boolean_networks/</link><pubDate>Fri, 16 Apr 2021 14:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/boolean_networks/</guid><description>tags Complex Systems A generalization of Cellular automata Boolean networks could be seen as CA generalization with any topology (not necessarily 1D or 2D). In the standard model, each node of the network is assigned a rule randomly chosen from the \(2^{2^k}\) possible ones with K inputs.
Like for cellular automata, cells (or nodes) don&amp;rsquo;t have to be in just two states (although the name Boolean no longer holds) and updates can be done either synchronously or asynchronously.</description></item><item><title>Avida</title><link>https://hugocisneros.com/notes/avida/</link><pubDate>Mon, 12 Apr 2021 14:57:00 +0200</pubDate><guid>https://hugocisneros.com/notes/avida/</guid><description>tags Artificial life, Evolution Avida is an Artificial life system inspired by Tierra which uses computer programs as individuals.
One interesting advantage of this system is the possibility to measure the complexity of organisms easily. This is done by counting the number of instructions in their computer program.</description></item><item><title>Hash functions</title><link>https://hugocisneros.com/notes/hash_functions/</link><pubDate>Mon, 12 Apr 2021 14:57:00 +0200</pubDate><guid>https://hugocisneros.com/notes/hash_functions/</guid><description>tags Computer science Hash functions map variable sized inputs to a finite set of outputs.
They need to have a range of properties such as:
Determinism: The output of a hash function should be the same every time for each input. Universality: Two inputs should have a probability of getting the same hash as close to \(1/n\) as possible, where \(n\) is the size of the output set. This is the minimum number of collisions.</description></item><item><title>Illegal numbers</title><link>https://hugocisneros.com/notes/illegal_numbers/</link><pubDate>Mon, 12 Apr 2021 14:29:00 +0200</pubDate><guid>https://hugocisneros.com/notes/illegal_numbers/</guid><description>tags Cryptography, Computer science resources Wikipedia A number that represents some information which is illegal to posses or transmit, making said number technically illegal. It can also refer to numbers that have a particular meaning or connotation that a government wishes to censor.
When focusing on some specific class of numbers one could create funny illegal numbers such as:
Illegal primes Illegal Pythagorean triples Illegal triangular numbers Illegal Fibonacci numbers etc.</description></item><item><title>The Bitter Lesson</title><link>https://hugocisneros.com/notes/the_bitter_lesson/</link><pubDate>Thu, 08 Apr 2021 10:35:00 +0200</pubDate><guid>https://hugocisneros.com/notes/the_bitter_lesson/</guid><description>tags Machine learning, Artificial Intelligence author Richard Sutton resources Link The Bitter Lesson is a pattern that can be observed in several areas of machine learning: many hard problems involving some form of artificial intelligence have seen dramatic progress at some point in the last 50 years, which was mostly driven by data and computations as opposed to &amp;ldquo;clever&amp;rdquo; human engineering.
If this trend is a fundamental principle (which is what the article argues) it would mean that most of the time spent on engineering features and task-specific representations is wasted.</description></item><item><title>Richard Sutton</title><link>https://hugocisneros.com/notes/richard_sutton/</link><pubDate>Thu, 08 Apr 2021 09:58:00 +0200</pubDate><guid>https://hugocisneros.com/notes/richard_sutton/</guid><description/></item><item><title>Note-taking</title><link>https://hugocisneros.com/notes/note_taking/</link><pubDate>Mon, 29 Mar 2021 14:49:00 +0200</pubDate><guid>https://hugocisneros.com/notes/note_taking/</guid><description> tags Writing Note-taking in Emacs with org-roam</description></item><item><title>Notes on: Intelligence without representation by Brooks, R. A. (1991)</title><link>https://hugocisneros.com/notes/brooksintelligencerepresentation1991/</link><pubDate>Mon, 29 Mar 2021 14:35:00 +0200</pubDate><guid>https://hugocisneros.com/notes/brooksintelligencerepresentation1991/</guid><description>tags Artificial Intelligence source (Brooks 1991) DONE Summary What is intelligence Intelligence cannot be thought of as a collection of building blocks that may one fall into place to form a coherent whole.
The authors argue for another approach to build artificially intelligent systems:
Build the systems incrementally, with complete systems each step of the way to ensure that the pieces and their interfaces are valid. Build intelligent systems at each step of the way that should be let loose in the real world with real sensing and action.</description></item><item><title>Notes on: The geometry of integration in text classification RNNs by Aitken, K., Ramasesh, V. V., Garg, A., Cao, Y., Sussillo, D., &amp; Maheswaranathan, N. (2020)</title><link>https://hugocisneros.com/notes/aitkengeometryintegrationtext2020/</link><pubDate>Thu, 25 Mar 2021 10:20:00 +0100</pubDate><guid>https://hugocisneros.com/notes/aitkengeometryintegrationtext2020/</guid><description>tags RNN, NLP source (Aitken et al. 2020) DONE Summary This paper takes a dynamical system based approach to study learning in RNNs. Gradient descent optimization in RNNs allows them to learn a simplified form of memory and information processing.
The authors use simple text classification tasks to try and understand if these learned properties can be understood by looking at the state dynamics of RNNs.
The RNNs usually behave like attractor networks, with the hidden state lying on a low-dimensional manifold.</description></item><item><title>Computer security</title><link>https://hugocisneros.com/notes/computer_security/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/computer_security/</guid><description>Some essential components of computer security:
Cryptography</description></item><item><title>CPPN</title><link>https://hugocisneros.com/notes/cppn/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/cppn/</guid><description> tags Neural networks, Genetic algorithms papers (Stanley 2007) resources Wikipedia Bibliography Kenneth O. Stanley. June 6, 2007. "Compositional Pattern Producing Networks: A Novel Abstraction of Development". Genetic Programming and Evolvable Machines 8 (2):131–62. DOI.</description></item><item><title>Diffusion limited aggregation</title><link>https://hugocisneros.com/notes/diffusion_limited_aggregation/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/diffusion_limited_aggregation/</guid><description> tags Applied maths, Physics</description></item><item><title>Entropy</title><link>https://hugocisneros.com/notes/entropy/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/entropy/</guid><description>tags Complexity metrics references (Shannon, Weaver 1975) For a discrete random variable \(X\) with outcomes \(x_i\), \(P(X=x_i) = P_i\), the entropy or uncertainty function of \(X\) is defined as \[ H(X) = -\sum_{i=1}^{N} P_i \log P_i \]
Entropy is always positive, and is maximized when the uncertainty is maximal, that is when \(P_1 = P_2 = &amp;hellip; = P_N = \frac{1}{N}\) entropy in that case is \(\log N\).
Interpretations:</description></item><item><title>Kolmogorov complexity</title><link>https://hugocisneros.com/notes/kolmogorov_complexity/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/kolmogorov_complexity/</guid><description>tags Complexity, Algorithmic Information theory, Computability theory Definition Invariance theorem For two descriptive languages \(L_1\) and \(L_2\) and their respective associated Kolmogorov complexity functions \(K_1\) and \(K_2\), there exist a constant \(c\) &amp;mdash; dependant only on \(L_1, L_2\) such that \[ \forall s, -c \leq K_1(s) - K_2(s) \leq c \]
In other words, there is always a bounded difference between the Kolmogorov complexity in two separate description languages.</description></item><item><title>Minimum description length</title><link>https://hugocisneros.com/notes/minimum_description_length/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/minimum_description_length/</guid><description> tags Complexity, Algorithmic Information theory papers (Grunwald 2007; Grunwald 2004) Bibliography Peter Grunwald. 2007. The Minimum Description Length Principle. MIT Press. Peter Grunwald. June 4, 2004. "A Tutorial Introduction to the Minimum Description Length Principle". http://arxiv.org/abs/math/0406077.</description></item><item><title>Neural tangent kernel</title><link>https://hugocisneros.com/notes/neural_tangent_kernel/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/neural_tangent_kernel/</guid><description>tags Neural networks For a neural network trying to minimize a quadratic loss, the gradient flow can be re-written from \[ \dot{w} = - \nabla L (w(t)) \] to \[ \dot{w} = - \nabla y(w) (y(w) - \bar{y}) \]
Therefore, the time derivative of \(y\) is \[ \dot{y}(w) = \nabla y(w)^T \dot{w} = - \nabla y(w)^T \nabla y(w) (y(w) - \bar{y}) \] The NTK is the quantity to the left of the last term: \(\nabla y(w)^T \nabla y(w)\).</description></item><item><title>Notes on: A model of urban evolution based on innovation diffusion by Raimbault, J. (2020)</title><link>https://hugocisneros.com/notes/raimbaultmodelurbanevolution2020/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/raimbaultmodelurbanevolution2020/</guid><description>source (Raimbault 2020) tags ALife 2020, Complex Systems, Evolution, Urban science Summary This paper studies the concept of innovation diffusion and how this could be seen as a way cities evolve.
Modeling this enables finding that global integration of cities (fully connected city graph on a territory) is not optimal for efficiently diffusing innovation that can spontaneously appear in any city.
I am interested in taking an ALife inspired approach to studying cities, as it can show cities as they could be.</description></item><item><title>Notes on: Evolving Neural Networks through Augmenting Topologies by Stanley, K. O., &amp; Miikkulainen, R. (2002)</title><link>https://hugocisneros.com/notes/stanleyevolvingneuralnetworks2002/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/stanleyevolvingneuralnetworks2002/</guid><description>tags Neural networks, Genetic algorithms, NAS source (Stanley, Miikkulainen 2002) Summary This is the main paper introducing the NEAT system. This system is a direct-encoding based way of dealing with neuroevolution (evolution of ANNs). The encoding is based on a genome sequentially specifying each of the connections between modules of the network. Several tickes are used to make it possible applying GA methods to evolve networks:
Historical tracking of genes to be able to align architectures and mate them.</description></item><item><title>Notes on: Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems by Reinke, C., Etcheverry, M., &amp; Oudeyer, P. (2020)</title><link>https://hugocisneros.com/notes/reinkeintrinsicallymotivateddiscovery2020/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/reinkeintrinsicallymotivateddiscovery2020/</guid><description>source (Reinke et al. 2020) Summary The authors address the problem of automated discovery of diverse self-organized patterns in high-dimensional and complex game-of-life types of dynamical systems. They conduct experiments on Lenia.
Their goal is to use an IMGEP algorithm to represent interesting patterns and discover them.
Problem setting Goal: With a budget of \(N\) experiments, maximize diversity of observations.
Parameter space \(\Theta\) of available parameters \(\theta\). An observation space \(O\) of observations.</description></item><item><title>Notes on: Network Deconvolution by Ye, C., Evanusa, M., He, H., Mitrokhin, A., Goldstein, T., Yorke, J. A., Fermuller, Cornelia, … (2020)</title><link>https://hugocisneros.com/notes/yenetworkdeconvolution2020/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/yenetworkdeconvolution2020/</guid><description>tags Convolutional neural networks, Neural network training source (Ye et al. 2020) Summary This paper introduces so-called Network Deconvolution, advertised as a way to remove pixel-wise and channel-wise correlation in deep neural networks.
The authors base their new operator on the optimal configuration for \(L_2\) linear regression, where gradient descent converges in one single step if and only if:
\[ \frac{1}{N}X^t X = I \] where \(X\) is the feature matrix and \(N\) the number of samples.</description></item><item><title>Notes on: Neuroevolution: from architectures to learning by Floreano, D., Dürr, P., &amp; Mattiussi, C. (2008)</title><link>https://hugocisneros.com/notes/floreanoneuroevolutionarchitectureslearning2008/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/floreanoneuroevolutionarchitectureslearning2008/</guid><description> tags NAS source (Floreano et al. 2008) TODO Summary TODO Comments Bibliography Dario Floreano, Peter Dürr, Claudio Mattiussi. March 1, 2008. "Neuroevolution: From Architectures to Learning". Evolutionary Intelligence 1 (1):47–62. DOI.</description></item><item><title>Notes on: On the expressive power of programming languages by Felleisen, M. (1991)</title><link>https://hugocisneros.com/notes/felleisenexpressivepowerprogramming1991/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/felleisenexpressivepowerprogramming1991/</guid><description>source (Felleisen 1991) tags Programming languages resources PWL Conf talk DONE Summary Programming languages have different levels of expressiveness. While can be used to create for loops, binary if statements can implement multi-if statements, etc.
Turing tarpit: once we get to programming languages that are universal, everything can be re-written into anything and the notion of &amp;ldquo;expressiveness&amp;rdquo; of programming languages doesn&amp;rsquo;t make much sense.
For a language \(L\) and \(F + L\) the addition of some features, can we say the second is more expressive than the first?</description></item><item><title>PCA</title><link>https://hugocisneros.com/notes/pca/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/pca/</guid><description> tags Data representation</description></item><item><title>Quality diversity</title><link>https://hugocisneros.com/notes/quality_diversity/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/quality_diversity/</guid><description>tags Evolution, Reinforcement learning, Search algorithms papers (Pugh et al. 2016; Cully, Demiris 2017) QD is about creating algorithms that favor diversity in searching the space. In QD, one needs to both:
Measure the quality of a solution Have a way to describe the effect of a solution Solutions in QD have to be good in the two above ways.
QD is also a form of novelty search.
Bibliography Justin K.</description></item><item><title>Turing degree</title><link>https://hugocisneros.com/notes/turing_degree/</link><pubDate>Thu, 25 Mar 2021 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/turing_degree/</guid><description>tags Computability theory The idea behind Turing degrees is similar to the notion of cardinality of infinite sets ($ℵ_0, ℵ_1, &amp;hellip;$) in the world of computation.
A Turing degree is an equivalence class for the Turing equivalence. Being Turing equivalent for two sets \(X\) and \(Y\) means that a Turing machine can decide if an element belongs to the set \(X\) when it has an oracle for membership to \(Y\) (there is a way to formulate the membership problem for \(X\) as a problem for \(Y\)) and reciprocally.</description></item><item><title>Assembly theory</title><link>https://hugocisneros.com/notes/assembly_theory/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/assembly_theory/</guid><description>tags Complexity metrics papers (Marshall et al. 2019) This complexity metric is based on ideas similar to Logical depth, where instead of just looking at the general process that led to the creation of an object, we also look at the number of elementary steps in that process.
Bibliography Stuart M Marshall, Douglas G Moore, Alastair R G Murray, Sara I Walker. July 2019. "Quantifying the Pathways to Life Using Assembly Spaces"</description></item><item><title>CMA-ES</title><link>https://hugocisneros.com/notes/cma_es/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/cma_es/</guid><description> tags Evolutionary algorithms</description></item><item><title>Notes on: A Computer Scientist's View of Life, the Universe, and Everything by Schmidhuber, J. (1999)</title><link>https://hugocisneros.com/notes/schmidhubercomputerscientistview1999/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/schmidhubercomputerscientistview1999/</guid><description> tags Zuse&amp;rsquo;s thesis source (Schmidhuber 1999) Summary Comments Bibliography Jürgen Schmidhuber. April 13, 1999. "A Computer Scientist's View of Life, the Universe, and Everything". http://arxiv.org/abs/quant-ph/9904050.</description></item><item><title>Notes on: A new structurally dissolvable self-reproducing loop evolving in a simple cellular automata space by Sayama, H. (1999)</title><link>https://hugocisneros.com/notes/sayamanewstructurallydissolvable1999/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/sayamanewstructurallydissolvable1999/</guid><description>source (Sayama 1999) tags Cellular automata, Evolution Summary This work presents a simple evolutionary system based on Langton&amp;rsquo;s self-reproducing loop. This is entirely done with a normal state-transition rule based CA. The initial structure of the loop was modified to catch variations. An interesting consequence of this system evolving is its natural tendency to evolve towards smaller loops despite no stochastic mutation being hard-coded.
Bibliography Hiroki Sayama. 1999. "A New Structurally Dissolvable Self-Reproducing Loop Evolving in a Simple Cellular Automata Space"</description></item><item><title>Notes on: Adapting to Unseen Environments through Explicit Representation of Context by Tutum, C., &amp; Miikkulainen, R. (2020)</title><link>https://hugocisneros.com/notes/tutumadaptingunseenenvironments2020/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/tutumadaptingunseenenvironments2020/</guid><description>source (Tutum, Miikkulainen 2020) tags Meta-learning, Reinforcement learning, ALife 2020 Summary This work introduces the idea of Context-Skill networks for continuous RL tasks. Experiments are done on a Flappy bird like game.
The authors use a LSTM as a context network to make part of the prediction and a feed-forward neural network as a skill network. They are able to demonstrate that in that game, better performances are achieved by using both networks compared to a single one.</description></item><item><title>Notes on: An Integrated Perspective on the Constitutive and Interactive Dimensions of Autonomy by Beer, R. D. (2020)</title><link>https://hugocisneros.com/notes/beerintegratedperspectiveconstitutive2020/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/beerintegratedperspectiveconstitutive2020/</guid><description>tags Emergence, Life, Cellular automata source (Beer 2020) Summary Constitution: &amp;ldquo;How emergent individuals are put together and maintained&amp;rdquo; Interaction: &amp;ldquo;How emergent individuals as a whole engage with the environment&amp;rdquo;
Use Conway&amp;rsquo;s Game of Life as a toy model where each cell and update rule is like the Physics of the universe and this physics gives rise to a simple chemistry which can in turn support self-sustaining networks of reactions and some form of biology.</description></item><item><title>Notes on: Cellular automata as convolutional neural networks by Gilpin, W. (2018)</title><link>https://hugocisneros.com/notes/gilpincellularautomataconvolutional2018/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/gilpincellularautomataconvolutional2018/</guid><description>tags Cellular automata as CNNs source (Gilpin 2018) Summary This is one of the only attempt to represent a CA rule as a CNN I have come across. The author uses a deep CNN to learn a rule and studies various information-theoretic quantities in the activation patterns to evaluate the complexity of the rules.
Comments I am personally very interested by the paper since it is an interesting direction for creating neural-network based rules that can be sampled and efficiently stored and applied.</description></item><item><title>Notes on: Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data by Bender, E. M., &amp; Koller, A. (2020)</title><link>https://hugocisneros.com/notes/benderclimbingnlumeaning2020/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/benderclimbingnlumeaning2020/</guid><description>source (Bender, Koller 2020) tags NLP, Artificial Intelligence, Evaluating NLP Summary The main point of the article could be summarized like so:
We argue that the language modeling task, because it only uses form as training data, cannot in principle lead to learning of meaning. We take the term language model to refer to any system trained only on the task of string prediction, whether it operates over characters, words or sentences, and sequentially or not.</description></item><item><title>Notes on: Combinatory Chemistry: Towards a Simple Model of Emergent Evolution by Kruszewski, G., &amp; Mikolov, T. (2020)</title><link>https://hugocisneros.com/notes/kruszewskicombinatorychemistrysimple2020/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/kruszewskicombinatorychemistrysimple2020/</guid><description>tags Artificial life, Combinatory logic source (Kruszewski, Mikolov 2020) Summary This is Kruszewski&amp;rsquo;s approach to artificial life, based on artificial chemistry. Combinatory logic is used as a basis for this system. Conservation laws are added on top of the set of rules that make combinatory logic Turing complete. This is then used to observe interesting dynamics and pre-life-like processes.
Comments Bibliography Germán Kruszewski, Tomas Mikolov. March 17, 2020. "Combinatory Chemistry: Towards a Simple Model of Emergent Evolution"</description></item><item><title>Notes on: Complexity and evolution: What everybody knows by McShea, D. W. (1991)</title><link>https://hugocisneros.com/notes/mcsheacomplexityevolutionwhat1991/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/mcsheacomplexityevolutionwhat1991/</guid><description> tags Evolution, Complexity source (McShea 1991) TODO Summary TODO Comments Bibliography Daniel W. McShea. July 1991. "Complexity and Evolution: What Everybody Knows". Biology &amp; Philosophy 6 (3):303–24. DOI.</description></item><item><title>Notes on: Curiosity-Driven Exploration by Self-Supervised Prediction by Pathak, D., Agrawal, P., Efros, A. A., &amp; Darrell, T. (2017)</title><link>https://hugocisneros.com/notes/pathakcuriositydrivenexplorationselfsupervised2017/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/pathakcuriositydrivenexplorationselfsupervised2017/</guid><description>tags Reinforcement learning source (Pathak et al. 2017) DONE Summary This paper presents a curiosity-based method for training RL agents. These agents are given a reward \(r_t\) which is the sum of an intrinsic and an extrinsic rewards. The latter is mostly (if not always) 0, while the former is constructed progressively during exploration by an Intrisic Curiosity Module (ICM).
The module is illustrated below (figure from the paper).</description></item><item><title>Notes on: Developmental mappings and phenotypic complexity by Lehre, P. K., &amp; Haddow, P. C. (2003)</title><link>https://hugocisneros.com/notes/lehredevelopmentalmappingsphenotypic2003/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/lehredevelopmentalmappingsphenotypic2003/</guid><description>tags Cellular automata source (Lehre, Haddow 2003) Summary The approach of the paper is to use a genotype/phenotype distance correlation plot to study the complexity of a system that is determined by a genotype and exhibits som ephenotypic behavior. This is equivalent to simply plotting the distance of two phenotypes (Hamming of the state after 100 iteration starting from a single activated cell for CAs) against the distance between two genotypes (Hamming distance between the rules for a CA).</description></item><item><title>Notes on: Diversity preservation in minimal criterion coevolution through resource limitation by Brant, J. C., &amp; Stanley, K. O. (2020)</title><link>https://hugocisneros.com/notes/brantdiversitypreservationminimal2020/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/brantdiversitypreservationminimal2020/</guid><description> source (Brant, Stanley 2020) tags Co-evolution, Evolutionary algorithms Summary Comments Bibliography Jonathan C. Brant, Kenneth O. Stanley. June 25, 2020. "Diversity Preservation in Minimal Criterion Coevolution Through Resource Limitation". In Proceedings of the 2020 Genetic and Evolutionary Computation Conference, 58–66. ACM. DOI.</description></item><item><title>Notes on: Drinking from a Firehose: Continual Learning with Web-scale Natural Language by Hu, H., Sener, O., Sha, F., &amp; Koltun, V. (2020)</title><link>https://hugocisneros.com/notes/hudrinkingfirehosecontinual2020/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/hudrinkingfirehosecontinual2020/</guid><description>tags Continual learning source (Hu et al. 2020) DONE Summary This paper focuses on the problem of (self-)supervised continual learning with deep neural networks. The Firehose dataset introduced by the authors is a large database of timestamped tweets. The goal is to learn a language model for each user from the dataset, which is called Personalized online language learning (POLL).
The authors also introduce a new extension of gradient descent for continual learning.</description></item><item><title>Notes on: Evolved Open-Endedness, Not Open-Ended Evolution by Pattee, H. H., &amp; Sayama, H. (2019)</title><link>https://hugocisneros.com/notes/patteeevolvedopenendednessnot2019/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/patteeevolvedopenendednessnot2019/</guid><description>source (Pattee, Sayama 2019) Summary Evolution need not have been inherently open-ended in nature, because from a simple cell evolving in a complex self-organising environment new mechanisms might have been created by the organisms themself, effectively rendering them &amp;ldquo;more&amp;rdquo; open-ended. Symbolic languages are a striking example of this phenomenon: an open-ended descriptive power where the complexity of the environment is not limiting because language can refer itself recursively to build on its complexity.</description></item><item><title>Notes on: Information-Theoretic Probing with Minimum Description Length by Voita, E., &amp; Titov, I. (2020)</title><link>https://hugocisneros.com/notes/voitainformationtheoreticprobingminimum2020/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/voitainformationtheoreticprobingminimum2020/</guid><description> tags Evaluating NLP, Transformers, Minimum description length source (Voita, Titov 2020) TODO Summary TODO Comments Bibliography Elena Voita, Ivan Titov. March 27, 2020. "Information-Theoretic Probing with Minimum Description Length". http://arxiv.org/abs/2003.12298.</description></item><item><title>Notes on: Learning Transferable Architectures for Scalable Image Recognition by Zoph, B., Vasudevan, V., Shlens, J., &amp; Le, Q. V. (2018)</title><link>https://hugocisneros.com/notes/zophlearningtransferablearchitectures2018/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/zophlearningtransferablearchitectures2018/</guid><description>tags NAS source (Zoph et al. 2018) Summary This paper is more or less a follow up of (Zoph, Le 2017) where the search space get at the same time widened and more constraints are added (division between normal cell for processing and reduction cell for pooling/downsampling). Normal cells get stacked \(N\) times resulting in very big architectures. NASNet is created by searching for thos cells but the actual number of cells stacked and number of filters of the penultimate layer are searched separately.</description></item><item><title>Notes on: Modeling systems with internal state using evolino by Wierstra, D., Gomez, F. J., &amp; Schmidhuber, J. (2005)</title><link>https://hugocisneros.com/notes/wierstramodelingsystemsinternal2005/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/wierstramodelingsystemsinternal2005/</guid><description> tags Genetic algorithms, Recurrent neural networks source (Wierstra et al. 2005) TODO Summary TODO Comments Bibliography Daan Wierstra, Faustino J. Gomez, Jürgen Schmidhuber. 2005. "Modeling Systems with Internal State Using Evolino". In Proceedings of the 2005 Conference on Genetic and Evolutionary Computation - GECCO '05, 1795. ACM Press. DOI.</description></item><item><title>Notes on: Molecule Attention Transformer by Maziarka, Ł., Danel, T., Mucha, S., Rataj, K., Tabor, J., &amp; Jastrzębski, S. (2020)</title><link>https://hugocisneros.com/notes/maziarkamoleculeattentiontransformer2020/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/maziarkamoleculeattentiontransformer2020/</guid><description> tags Neural networks source (Maziarka et al. 2020) TODO Summary TODO Comments Bibliography Łukasz Maziarka, Tomasz Danel, Sławomir Mucha, Krzysztof Rataj, Jacek Tabor, Stanisław Jastrzębski. February 19, 2020. "Molecule Attention Transformer". http://arxiv.org/abs/2002.08264.</description></item><item><title>Notes on: Neural Architecture Search with Reinforcement Learning by Zoph, B., &amp; Le, Q. V. (2017)</title><link>https://hugocisneros.com/notes/zophneuralarchitecturesearch2017/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/zophneuralarchitecturesearch2017/</guid><description>tags NAS source (Zoph, Le 2017) Summary This paper introduces the idea of using a RNN controller system to generate the operations of a neural network. In a first setting the authors use this method to construct CNNs. The controller samples an architecture, the architecture is built and trained and the controller is rewarded with the maximum validation accuracy of the last 5 epochs cubed (??).
Another experiment uses this exploration method to produce recurrent cell through a complicated model based on a tree of units, for each of which the controller samples an operation.</description></item><item><title>Notes on: Neural Circuit Policies Enabling Auditable Autonomy by Lechner, M., Hasani, R., Amini, A., Henzinger, T. A., Rus, D., &amp; Grosu, R. (2020)</title><link>https://hugocisneros.com/notes/lechnerneuralcircuitpolicies2020/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/lechnerneuralcircuitpolicies2020/</guid><description>source (Lechner et al. 2020) tags Neural networks DONE Summary This article introduces a type of RNN called Neural Circuit Policies (NCP). This architecture is said to be inspired from the wiring diagram of the C. elegans nematode.
The main building block is a Recurrent neural network called liquid time constant (LTC) introduced in (Hasani et al. 2020).
LTC Neurons These neurons are bio-inspired. For a given neuron in state x_i(t), the continuous temporal evolution is described by an ODE: \[ \dot{x}_i = - \left(\frac{1}{\tau_i} + \frac{w_{ij}}{C_{m_i}} \sigma_i(x_j) \right) x_i + \left( \frac{x_{\text{leak}_i}}{\tau_i}+ \frac{w_{ij}}{C_{m_i}} \sigma_i(x_j) E_{ij} \right) \]</description></item><item><title>Notes on: On Adversarial Mixup Resynthesis by Beckham, C., Honari, S., Verma, V., Lamb, A., Ghadiri, F., Hjelm, R. D., Bengio, Y., … (2019)</title><link>https://hugocisneros.com/notes/beckhamadversarialmixupresynthesis2019/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/beckhamadversarialmixupresynthesis2019/</guid><description> tags Autoencoders source (Beckham et al. 2019) TODO Summary TODO Comments Bibliography Christopher Beckham, Sina Honari, Vikas Verma, Alex Lamb, Farnoosh Ghadiri, R. Devon Hjelm, Yoshua Bengio, Christopher Pal. October 23, 2019. "On Adversarial Mixup Resynthesis". http://arxiv.org/abs/1903.02709.</description></item><item><title>Notes on: PCGRL: Procedural Content Generation via Reinforcement Learning by Khalifa, A., Bontrager, P., Earle, S., &amp; Togelius, J. (2020)</title><link>https://hugocisneros.com/notes/khalifapcgrlproceduralcontent2020/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/khalifapcgrlproceduralcontent2020/</guid><description> tags Reinforcement learning source (Khalifa et al. 2020) TODO Summary TODO Comments Bibliography Ahmed Khalifa, Philip Bontrager, Sam Earle, Julian Togelius. January 24, 2020. "PCGRL: Procedural Content Generation via Reinforcement Learning". http://arxiv.org/abs/2001.09212.</description></item><item><title>Notes on: POET: open-ended coevolution of environments and their optimized solutions by Wang, R., Lehman, J., Clune, J., &amp; Stanley, K. O. (2019)</title><link>https://hugocisneros.com/notes/wangpoetopenendedcoevolution2019/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/wangpoetopenendedcoevolution2019/</guid><description>tags Open-ended Evolution, Reinforcement learning source (Wang et al. 2019) Summary This paper is about introducing the POET architecture. The core idea behind this framework is to build a system that can make agents learn complex behavior through joint evolution of agents and the environment. The better the agent, the more complex environment we can give it.
There are 3 main components to the algorithm: an evolutionary strategy (ES) for the environment itself, resembling genetic algorithm, another ES for the agents (although these agents might also be trained with RL), and a transfer mechanism whereby agents trained in a particular environment can be trained on another one.</description></item><item><title>Notes on: Regenerating Soft Robots through Neural Cellular Automata by Horibe, K., Walker, K., &amp; Risi, S. (2021)</title><link>https://hugocisneros.com/notes/horiberegeneratingsoftrobots2021/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/horiberegeneratingsoftrobots2021/</guid><description> tags Cellular automata, Reinforcement learning source (Horibe et al. 2021) TODO Summary The authors explore neural cellular automata (Mordvintsev et al. 2020) as a framework for growing soft robots.
TODO Comments Bibliography Kazuya Horibe, Kathryn Walker, Sebastian Risi. February 7, 2021. "Regenerating Soft Robots Through Neural Cellular Automata". http://arxiv.org/abs/2102.02579. Alexander Mordvintsev, Ettore Randazzo, Eyvind Niklasson, Michael Levin. February 11, 2020. "Growing Neural Cellular Automata". Distill 5 (2):e23. DOI.&amp;nbsp;See notes</description></item><item><title>Notes on: Reservoir Computing in Artificial Spin Ice by Jensen, J. H., &amp; Tufte, G. (2020)</title><link>https://hugocisneros.com/notes/jensenreservoircomputingartificial2020/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/jensenreservoircomputingartificial2020/</guid><description>source (Jensen, Tufte 2020) tags Reservoir computing, Complex Systems Summary This talk is about artificial spin ice. This model is based on a grid of coupled magnets that can be controlled with a magnetic field. The geometry of that grid can very greatly the kind of behavior one may observe in such systems.
The authors want to use the spin ice model for reservoir computing. They measure useful quantities such as kernel quality \(K\) (ability to separate inputs) and generalization capabilities \(G\) (how similar inputs yield similar results).</description></item><item><title>Notes on: Seeking open-ended evolution in Swarm Chemistry by Sayama, H. (2011)</title><link>https://hugocisneros.com/notes/sayamaseekingopenendedevolution2011/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/sayamaseekingopenendedevolution2011/</guid><description> tags Open-ended Evolution source (Sayama 2011) TODO Summary TODO Comments Bibliography Hiroki Sayama. April 2011. "Seeking Open-Ended Evolution in Swarm Chemistry". In 2011 IEEE Symposium on Artificial Life (ALIFE), 186–93. IEEE. DOI.</description></item><item><title>Notes on: Spontaneous fine-tuning to environment in many-species chemical reaction networks by Horowitz, J. M., &amp; England, J. L. (2017)</title><link>https://hugocisneros.com/notes/horowitzspontaneousfinetuningenvironment2017/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/horowitzspontaneousfinetuningenvironment2017/</guid><description> tags Chemical reaction network, Biological life source (Horowitz, England 2017) TODO Summary TODO Comments Bibliography Jordan M. Horowitz, Jeremy L. England. July 18, 2017. "Spontaneous Fine-Tuning to Environment in Many-Species Chemical Reaction Networks". Proceedings of the National Academy of Sciences 114 (29). National Academy of Sciences:7565–70. DOI.</description></item><item><title>Notes on: The Architecture of Complexity by Simon, H. A. (1962)</title><link>https://hugocisneros.com/notes/simonarchitecturecomplexity1962/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/simonarchitecturecomplexity1962/</guid><description>tags Complexity, Complex Systems source (Simon 1962) Complex systems In such systems, the whole is more than the sum of the parts, not in an ultimate, metaphysical sense, but in the important pragmatic sense that, given the properties of the parts and the laws of their interaction, it is not a trivial matter to infer the properties of the whole. In the face of complexity, an in-principle reductionist may be at the same time a pragmatic holist.</description></item><item><title>Notes on: Transition phenomena in cellular automata rule space by Li, W., Packard, N. H., &amp; Langton, C. G. (1990)</title><link>https://hugocisneros.com/notes/litransitionphenomenacellular1990/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/litransitionphenomenacellular1990/</guid><description>tags Cellular automata source (Li et al. 1990) Summary This foundational paper follows Langton&amp;rsquo;s work on chaos and the lambda parameter. It uses information-theoretic measures to try and understand the structure of the space of CA rules. The authors come up with a classification in 6 classes:
Spatially homogeneous fixed points Spatially inhomogeneous fixed points Periodic behavior Locally chaotic behavior Chaotic behavior Complex behavior Wolfram&amp;rsquo;s class I is equivalent to class 1, class II is equivalent to class 2, 3 and 4.</description></item><item><title>Recurrent neural networks</title><link>https://hugocisneros.com/notes/recurrent_neural_networks/</link><pubDate>Thu, 25 Mar 2021 09:57:00 +0100</pubDate><guid>https://hugocisneros.com/notes/recurrent_neural_networks/</guid><description> tags Neural networks, Machine learning</description></item><item><title>Algorithmic Information theory</title><link>https://hugocisneros.com/notes/algorithmic_information_theory/</link><pubDate>Thu, 25 Mar 2021 09:56:00 +0100</pubDate><guid>https://hugocisneros.com/notes/algorithmic_information_theory/</guid><description/></item><item><title>Convolutional neural networks</title><link>https://hugocisneros.com/notes/convolutional_neural_networks/</link><pubDate>Thu, 25 Mar 2021 09:56:00 +0100</pubDate><guid>https://hugocisneros.com/notes/convolutional_neural_networks/</guid><description> tags Neural networks</description></item><item><title>Gödel's theorem</title><link>https://hugocisneros.com/notes/godel_s_theorem/</link><pubDate>Thu, 25 Mar 2021 09:56:00 +0100</pubDate><guid>https://hugocisneros.com/notes/godel_s_theorem/</guid><description>tags Logic resources Stanford encyclopedia of Philosophy First incompleteness theorem Any consistent formal system F within which a certain amount of elementary arithmetic can be carried out is incomplete; i.e., there are statements of the language of F which can neither be proved nor disproved in F.
Panu Raatikainen
This theorem was followed by several closely related theorems, such as Turing&amp;rsquo;s Halting problem</description></item><item><title>Halting probability</title><link>https://hugocisneros.com/notes/halting_probability/</link><pubDate>Thu, 25 Mar 2021 09:56:00 +0100</pubDate><guid>https://hugocisneros.com/notes/halting_probability/</guid><description> tags Computability theory, Algorithmic Information theory, Halting problem</description></item><item><title>Ordinary least squares</title><link>https://hugocisneros.com/notes/ordinary_least_squares/</link><pubDate>Thu, 25 Mar 2021 09:56:00 +0100</pubDate><guid>https://hugocisneros.com/notes/ordinary_least_squares/</guid><description> tags Applied maths, Optimization</description></item><item><title>Sed utility</title><link>https://hugocisneros.com/notes/sed_utility/</link><pubDate>Thu, 25 Mar 2021 09:56:00 +0100</pubDate><guid>https://hugocisneros.com/notes/sed_utility/</guid><description>tags Coding In-place batch file manipulation Delete the same line in many files Let&amp;rsquo;s start by creating a simple text file with three lines. This is what it looks like:
echo &amp;#34;Hello\nto the\nworld&amp;#34; &amp;gt; test.txt cat test.txt Hello to the world We use sed to remove lines in the file matching some regex. The -i.bak option ensures the file is modified in place.
sed -i.bak &amp;#39;/to the/d&amp;#39; test.txt cat test.</description></item><item><title>Turing test</title><link>https://hugocisneros.com/notes/turing_test/</link><pubDate>Wed, 24 Mar 2021 10:40:00 +0100</pubDate><guid>https://hugocisneros.com/notes/turing_test/</guid><description>tags Artificial intelligence test This is probably one of the most famous test for artificial intelligence. It was elaborated by Alan Turing.</description></item><item><title>Network programming</title><link>https://hugocisneros.com/notes/network_programming/</link><pubDate>Wed, 24 Mar 2021 09:42:00 +0100</pubDate><guid>https://hugocisneros.com/notes/network_programming/</guid><description>tags Programming, Networking An incredible resource for low-level network programming: Beej&amp;rsquo;s guide to network programming.</description></item><item><title>Networking</title><link>https://hugocisneros.com/notes/networking/</link><pubDate>Wed, 24 Mar 2021 09:41:00 +0100</pubDate><guid>https://hugocisneros.com/notes/networking/</guid><description> tags Programming</description></item><item><title>Lambda calculus</title><link>https://hugocisneros.com/notes/lambda_calculus/</link><pubDate>Wed, 03 Mar 2021 14:47:00 +0100</pubDate><guid>https://hugocisneros.com/notes/lambda_calculus/</guid><description> tags Computer science</description></item><item><title>Von Neumann's self-reproducing CA</title><link>https://hugocisneros.com/notes/von_neumann_s_self_reproducing_ca/</link><pubDate>Wed, 03 Mar 2021 08:45:00 +0100</pubDate><guid>https://hugocisneros.com/notes/von_neumann_s_self_reproducing_ca/</guid><description> tags Cellular automata, John Von Neumann</description></item><item><title>Automatic differentiation</title><link>https://hugocisneros.com/notes/automatic_differentiation/</link><pubDate>Wed, 03 Mar 2021 08:43:00 +0100</pubDate><guid>https://hugocisneros.com/notes/automatic_differentiation/</guid><description> tags Applied maths, Optimization</description></item><item><title>Why programming is a good medium for expressing poorly understood and sloppily-formulated ideas</title><link>https://hugocisneros.com/notes/why_programming_is_a_good_medium/</link><pubDate>Tue, 02 Mar 2021 10:17:00 +0100</pubDate><guid>https://hugocisneros.com/notes/why_programming_is_a_good_medium/</guid><description>source Link tags Artificial Intelligence, Coding author Marvin Minsky What can computers do? The fallacy under discussion is the widespread superstition that we can&amp;rsquo;t write a computer program to do something unless one has an extremely clear, precise formulation of what is to be done, and exactly how to do it.
It is generally believed that computer programs cannot be more than a set of rules and instructions for what to do in a given computer state.</description></item><item><title>Article: Why Sex? Biologists Find New Explanations</title><link>https://hugocisneros.com/notes/why_sex_biologists_find_new_explanations/</link><pubDate>Tue, 02 Mar 2021 10:15:00 +0100</pubDate><guid>https://hugocisneros.com/notes/why_sex_biologists_find_new_explanations/</guid><description> source Link tags Biological life, Evolution</description></item><item><title>Wirth's law</title><link>https://hugocisneros.com/notes/wirth_s_law/</link><pubDate>Tue, 02 Mar 2021 10:15:00 +0100</pubDate><guid>https://hugocisneros.com/notes/wirth_s_law/</guid><description>tags Computer science resources Wikipedia It is an adage which states that software is getting slower more rapidly than hardware is becoming faster.</description></item><item><title>Graphs</title><link>https://hugocisneros.com/notes/graphs/</link><pubDate>Wed, 24 Feb 2021 11:33:00 +0100</pubDate><guid>https://hugocisneros.com/notes/graphs/</guid><description> tags Mathematics, Computer science</description></item><item><title>Message-passing graph networks</title><link>https://hugocisneros.com/notes/message_passing_graph_networks/</link><pubDate>Wed, 24 Feb 2021 11:33:00 +0100</pubDate><guid>https://hugocisneros.com/notes/message_passing_graph_networks/</guid><description> tags Graph neural networks</description></item><item><title>Attention graph networks</title><link>https://hugocisneros.com/notes/attention_graph_networks/</link><pubDate>Wed, 24 Feb 2021 11:32:00 +0100</pubDate><guid>https://hugocisneros.com/notes/attention_graph_networks/</guid><description> tags Graph neural networks, Attention</description></item><item><title>Compilation</title><link>https://hugocisneros.com/notes/compilation/</link><pubDate>Mon, 22 Feb 2021 13:44:00 +0100</pubDate><guid>https://hugocisneros.com/notes/compilation/</guid><description> tags Computer science Compilation is the act of converting code from one programming language to another.
Some compiled languages C Programming language Rust C++</description></item><item><title>Regular expressions</title><link>https://hugocisneros.com/notes/regular_expressions/</link><pubDate>Tue, 16 Feb 2021 21:10:00 +0100</pubDate><guid>https://hugocisneros.com/notes/regular_expressions/</guid><description> tags Coding, NLP</description></item><item><title>Novelty search</title><link>https://hugocisneros.com/notes/novelty_search/</link><pubDate>Thu, 11 Feb 2021 09:34:00 +0100</pubDate><guid>https://hugocisneros.com/notes/novelty_search/</guid><description> tags Search</description></item><item><title>Search algorithms</title><link>https://hugocisneros.com/notes/search_algorithms/</link><pubDate>Wed, 10 Feb 2021 15:16:00 +0100</pubDate><guid>https://hugocisneros.com/notes/search_algorithms/</guid><description> tags Algorithm</description></item><item><title>Talk: The Importance of Open-Endedness in AI and Machine Learning</title><link>https://hugocisneros.com/notes/talk_the_importance_of_open_endedness_in_ai_and_machine_learning/</link><pubDate>Wed, 10 Feb 2021 11:03:00 +0100</pubDate><guid>https://hugocisneros.com/notes/talk_the_importance_of_open_endedness_in_ai_and_machine_learning/</guid><description>tags Open-ended Evolution, Artificial Intelligence speaker Kenneth Stanley source Youtube Why should we care about open-endedness? There is nothing you can point to that would be worth coming back to a billions year from now to see what happened. And yet, we are inside of such a system and such a system produced us.
Evolution is a seemingly open-ended process for which we only have access to a single run&amp;rsquo;s current and past results.</description></item><item><title>Picbreeder</title><link>https://hugocisneros.com/notes/picbreeder/</link><pubDate>Wed, 10 Feb 2021 11:02:00 +0100</pubDate><guid>https://hugocisneros.com/notes/picbreeder/</guid><description> tags Search, Open-ended Evolution</description></item><item><title>Nyström method</title><link>https://hugocisneros.com/notes/nystrom_method/</link><pubDate>Wed, 10 Feb 2021 09:36:00 +0100</pubDate><guid>https://hugocisneros.com/notes/nystrom_method/</guid><description> tags Applied maths This method was introduced as a way to speed-up kernel machines in (Williams, Seeger 2001).
Bibliography Christopher Williams, Matthias Seeger. 2001. "Using the Nyström Method to Speed up Kernel Machines". In Advances in Neural Information Processing Systems, edited by T. Leen, T. Dietterich, and V. Tresp, 13:682–88. MIT Press. https://proceedings.neurips.cc/paper/2000/file/19de10adbaa1b2ee13f77f679fa1483a-Paper.pdf.</description></item><item><title>Kernel Machine</title><link>https://hugocisneros.com/notes/kernel_machine/</link><pubDate>Fri, 05 Feb 2021 13:48:00 +0100</pubDate><guid>https://hugocisneros.com/notes/kernel_machine/</guid><description> tags Kernel Methods</description></item><item><title>Gradient flow</title><link>https://hugocisneros.com/notes/gradient_flow/</link><pubDate>Wed, 09 Dec 2020 14:11:00 +0100</pubDate><guid>https://hugocisneros.com/notes/gradient_flow/</guid><description>tags Gradient descent, Optimization The gradient flow for a model parametrized by parameters \(w\) and a loss function \(L\) is written:
\[ \dot{w} = - \nabla L (w(t)) \]</description></item><item><title>Privacy-preserving machine learning</title><link>https://hugocisneros.com/notes/privacy_preserving_machine_learning/</link><pubDate>Wed, 02 Dec 2020 11:16:00 +0100</pubDate><guid>https://hugocisneros.com/notes/privacy_preserving_machine_learning/</guid><description>tags Machine learning, Online privacy This is a kind of machine learning where one wants to train a model or perform inference without transmitting sensitive information.
This information could leak because of data transmission to an untrusted computing server, or because the model itself reveals the structure of its training data (Ateniese et al. 2013; Song et al. 2017).
Bibliography Giuseppe Ateniese, Giovanni Felici, Luigi V. Mancini, Angelo Spognardi, Antonio Villani, Domenico Vitali.</description></item><item><title>Homomorphic encryption</title><link>https://hugocisneros.com/notes/fully_homomorphic_encryption/</link><pubDate>Wed, 02 Dec 2020 10:28:00 +0100</pubDate><guid>https://hugocisneros.com/notes/fully_homomorphic_encryption/</guid><description>tags Cryptography resources Vitalik Buterin&amp;rsquo;s blog Principle The idea of homomorphic encryption is to encrypt data in such a way that, given a function \(f\) and a message to encrypt \(x\), \(\text{enc}(f(x)) = f(\text{enc}(x))\).
This idea is similar in spirit to Privacy-preserving machine learning, or federated learning, where one wants to obfuscate data while still being able to use it in a learning model. Here, one considers arbitrary functions.</description></item><item><title>Graph convolutional networks</title><link>https://hugocisneros.com/notes/graph_convolutional_networks/</link><pubDate>Tue, 01 Dec 2020 15:34:00 +0100</pubDate><guid>https://hugocisneros.com/notes/graph_convolutional_networks/</guid><description> tags Convolutional neural networks, Graph neural networks</description></item><item><title>Complexity of cellular automata</title><link>https://hugocisneros.com/notes/complexity_of_cellular_automata/</link><pubDate>Wed, 25 Nov 2020 09:20:00 +0100</pubDate><guid>https://hugocisneros.com/notes/complexity_of_cellular_automata/</guid><description> tags Complexity, Cellular automata Measuring complexity created by cellular automata is a vast subject.
Using Entropy In (Wuensche 1999), the author uses the entropy of rule table lookup frequencies to evaluate the complexity of a CA.
Bibliography Andrew Wuensche. 1999. "Classifying Cellular Automata Automatically: Finding Gliders, Filtering, and Relating Space-Time Patterns, Attractor Basins, and the Z Parameter". Complexity 4 (3):47–66. DOI.</description></item><item><title>Rainbow tables</title><link>https://hugocisneros.com/notes/rainbow_tables/</link><pubDate>Wed, 25 Nov 2020 09:14:00 +0100</pubDate><guid>https://hugocisneros.com/notes/rainbow_tables/</guid><description> tags Cryptography resources How Rainbow tables work</description></item><item><title>Self-replication</title><link>https://hugocisneros.com/notes/self_replication/</link><pubDate>Thu, 12 Nov 2020 11:49:00 +0100</pubDate><guid>https://hugocisneros.com/notes/self_replication/</guid><description> tags Complexity, Self-organization An early example of artificial self-replication is Von Neumann&amp;rsquo;s self-reproducing CA which is a cellular automaton.
Self-replication in neural networks can be done with neural network quines (Chang, Lipson 2018).
Bibliography Oscar Chang, Hod Lipson. May 24, 2018. "Neural Network Quine". http://arxiv.org/abs/1803.05859.</description></item><item><title>Dirichlet energy</title><link>https://hugocisneros.com/notes/dirichlet_energy/</link><pubDate>Tue, 10 Nov 2020 09:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/dirichlet_energy/</guid><description>The dirichlet energy of a continuous function on \(\mathbb{R}^d\) is the \(L^2\) norm of its gradient.
In the case of a 2D graph, such as a cellular automaton or hopfield network, this can be discretized as the \(L^2\) norm of the difference along each edge.</description></item><item><title>Hadamard product</title><link>https://hugocisneros.com/notes/hadamard_product/</link><pubDate>Thu, 05 Nov 2020 10:23:00 +0100</pubDate><guid>https://hugocisneros.com/notes/hadamard_product/</guid><description>The Hadamard product is a mathematical name for element-wise multiplication of matrices.</description></item><item><title>Wavelets</title><link>https://hugocisneros.com/notes/wavelets/</link><pubDate>Wed, 04 Nov 2020 09:23:00 +0100</pubDate><guid>https://hugocisneros.com/notes/wavelets/</guid><description> tags Applied maths, Signal processing Wavelets are functions with specific properties that make them useful when dealing with images. They are used for lossy image compression.
Types of wavelets Haar wavelets Daubechies wavelets</description></item><item><title>Signal processing</title><link>https://hugocisneros.com/notes/signal_processing/</link><pubDate>Wed, 04 Nov 2020 09:18:00 +0100</pubDate><guid>https://hugocisneros.com/notes/signal_processing/</guid><description> tags Applied maths</description></item><item><title>Image processing</title><link>https://hugocisneros.com/notes/image_processing/</link><pubDate>Wed, 04 Nov 2020 09:17:00 +0100</pubDate><guid>https://hugocisneros.com/notes/image_processing/</guid><description>tags Applied maths, Signal processing Scale an image with no interpolation Imagemagick documentation
convert source.[png|gif|...] -scale 400% target.[png|gif|...] The scale option can also take integer parameters (without the percent sign) to indicate the target size.
Remove metadata from an image Useful for preserving Online privacy when publishing images. Pictures taken with smartphones and other modern devices often contain large amounts of data about location, time and date and device type.</description></item><item><title>Neuroscience</title><link>https://hugocisneros.com/notes/neuroscience/</link><pubDate>Mon, 02 Nov 2020 15:58:00 +0100</pubDate><guid>https://hugocisneros.com/notes/neuroscience/</guid><description> tags Biological life, Artificial Intelligence</description></item><item><title>Dynamical systems</title><link>https://hugocisneros.com/notes/dynamical_systems/</link><pubDate>Tue, 27 Oct 2020 20:26:00 +0100</pubDate><guid>https://hugocisneros.com/notes/dynamical_systems/</guid><description> tags Applied maths, Physics</description></item><item><title>Stable marriage problem</title><link>https://hugocisneros.com/notes/stable_marriage_problem/</link><pubDate>Thu, 22 Oct 2020 09:42:00 +0200</pubDate><guid>https://hugocisneros.com/notes/stable_marriage_problem/</guid><description> tags Algorithm</description></item><item><title>Turing completeness of cellular automata</title><link>https://hugocisneros.com/notes/turing_completeness_of_cellular_automata/</link><pubDate>Tue, 20 Oct 2020 08:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/turing_completeness_of_cellular_automata/</guid><description> tags Cellular automata, Turing-completeness Rule 110 Elementary cellular automaton rule 110 is universal (Cook 2004).
Game of Life Conway&amp;rsquo;s Game of Life has also been show to be Turing-complete. Gliders can be used to implement logic gates.
A working computer in Game of Life
Bibliography Matthew Cook. 2004. "Universality in Elementary Cellular Automata". Complex Systems, 40.</description></item><item><title>Reversible cellular automata</title><link>https://hugocisneros.com/notes/reversible_cellular_automata/</link><pubDate>Fri, 16 Oct 2020 14:11:00 +0200</pubDate><guid>https://hugocisneros.com/notes/reversible_cellular_automata/</guid><description> tags Cellular automata Second-order CA Block CA</description></item><item><title>Raven's progressive matrices</title><link>https://hugocisneros.com/notes/raven_s_progressive_matrices/</link><pubDate>Fri, 16 Oct 2020 10:57:00 +0200</pubDate><guid>https://hugocisneros.com/notes/raven_s_progressive_matrices/</guid><description>tags Artificial intelligence test It is a visual test used to estimate abstract reasoning. The patterns are often between 2x2 and 6x6 matrices of symbols. One of these symbols is usually left blank and supposed to be deduced from the others.
The overall concept is very similar to the Abstraction and Reasoning Corpus but is much more tied to human vision. This makes the range of possible tasks much larger but also harder to integrate in an algorithm.</description></item><item><title>Turing Machine</title><link>https://hugocisneros.com/notes/turing_machine/</link><pubDate>Mon, 05 Oct 2020 08:07:00 +0200</pubDate><guid>https://hugocisneros.com/notes/turing_machine/</guid><description>tags Computability theory, Computer science resources Wikipedia The machine was invented by Alan Turing in 1936.
General definition A Turing Machine is usually composed of four main components:
A tape divided into cells. This tape is the way the machine reads inputs, writes outputs and manipulates information (storing it, moving it, etc.). Each cell can contain any symbol of a predefined alphabet. It is also often presented as infinitely long on both sides.</description></item><item><title>Alan Turing</title><link>https://hugocisneros.com/notes/alan_turing/</link><pubDate>Mon, 05 Oct 2020 08:06:00 +0200</pubDate><guid>https://hugocisneros.com/notes/alan_turing/</guid><description> tags Computer science, Artificial Intelligence, Cryptography</description></item><item><title>Optimization</title><link>https://hugocisneros.com/notes/optimization/</link><pubDate>Fri, 02 Oct 2020 16:43:00 +0200</pubDate><guid>https://hugocisneros.com/notes/optimization/</guid><description> tags Mathematics, Applied maths</description></item><item><title>Echo-state networks</title><link>https://hugocisneros.com/notes/echo_state_networks/</link><pubDate>Fri, 02 Oct 2020 16:17:00 +0200</pubDate><guid>https://hugocisneros.com/notes/echo_state_networks/</guid><description>tags Recurrent neural networks, Unsupervised learning resources Scholarpedia Principle An echo state network is usually a standard RNN with fixed random weights. The output from this RNN is used as a high dimensional feature map to be fed into a machine learning system.
(Jaeger 2004; Jaeger et al. 2007; Jaeger 2012)
Bibliography H. Jaeger. April 2, 2004. "Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication". Science 304 (5667):78–80.</description></item><item><title>Unsupervised learning</title><link>https://hugocisneros.com/notes/unsupervised_learning/</link><pubDate>Fri, 02 Oct 2020 09:16:00 +0200</pubDate><guid>https://hugocisneros.com/notes/unsupervised_learning/</guid><description> tags Machine learning</description></item><item><title>Python</title><link>https://hugocisneros.com/notes/python/</link><pubDate>Mon, 28 Sep 2020 10:32:00 +0200</pubDate><guid>https://hugocisneros.com/notes/python/</guid><description>tags Programming languages, Coding Code tips Categories to one-hot This is a handy technique but can be very resource intensive for large arrays.
import numpy as np a = np.random.randrange(5, size=10) one_hot_a = np.eye(5)[a] Side-output for jupyter notebooks Insert the following block in a notebook cell and execute as code (From Twitter). This will put the output of each cell on the side of the code.
%%html &amp;lt;style&amp;gt; #notebook-container {width: 100%; background-color: #EEE} .</description></item><item><title>Self-organization</title><link>https://hugocisneros.com/notes/self_organization/</link><pubDate>Wed, 23 Sep 2020 16:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/self_organization/</guid><description>Self-organization is an emergent phenomenon
Open questions in self-organization From (Gershenson et al. 2020)
How can self-organization be programmed? This question is fundamental. It is one thing to have systems that exhibit beautiful and surprising self-organization, but its a different thing to be able to steer it in the right direction and use it.
Can the macroscopic outcomes of self-organization be predicted? What is the role of self-organization in the open problems of ALife?</description></item><item><title>Optimal control</title><link>https://hugocisneros.com/notes/optimal_control/</link><pubDate>Wed, 23 Sep 2020 11:56:00 +0200</pubDate><guid>https://hugocisneros.com/notes/optimal_control/</guid><description>tags Applied maths resources Book by Daniel Liberzon Optimal control problem An typical optimal control problem starts with a control system \[ \dot{x} = f(t, x, u), \quad x(t_0) = x_0 \] where \(x\) is the state of the system, \(t\) represents time and \(u\) is the control input.
The goal of an OC problem is to minimize a cost functional of the form \[ J(u) := \int_{t_0}^{t_f}L(t, x(t), u(t))dt + K(t_f, x_f).</description></item><item><title>Robotics</title><link>https://hugocisneros.com/notes/robotics/</link><pubDate>Mon, 21 Sep 2020 22:59:00 +0200</pubDate><guid>https://hugocisneros.com/notes/robotics/</guid><description> tags Artificial life, Artificial Intelligence</description></item><item><title>Artificial intelligence test</title><link>https://hugocisneros.com/notes/artificial_intelligence_test/</link><pubDate>Wed, 16 Sep 2020 10:56:00 +0200</pubDate><guid>https://hugocisneros.com/notes/artificial_intelligence_test/</guid><description>tags Artificial Intelligence The most famous example is the Turing test.</description></item><item><title>Chinese room experiment</title><link>https://hugocisneros.com/notes/chinese_room_experiment/</link><pubDate>Wed, 16 Sep 2020 10:56:00 +0200</pubDate><guid>https://hugocisneros.com/notes/chinese_room_experiment/</guid><description> tags Artificial intelligence test</description></item><item><title>Fast Marching method</title><link>https://hugocisneros.com/notes/fast_marching_method/</link><pubDate>Mon, 07 Sep 2020 10:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/fast_marching_method/</guid><description>tags Applied maths, Algorithm The fast marching method can be seen as a way to improve the metric issue with Dijkstra&amp;rsquo;s algorithm (which actually computes the \(\ell_1\) distance on a grid). The graph update is replaced with the Eikonal equation resolution in the FM method. This reduces the bias of using a grid and converges towards the underlying geodesic distance when the grid step size tends towards 0.
The FM algorithm replaces the graph update (\(D_j \leftarrow \min_{k \sim j} D_k + W_j\)) with a local resolution of the Eikonal equation</description></item><item><title>Dijkstra's algorithm</title><link>https://hugocisneros.com/notes/dijkstra_s_algorithm/</link><pubDate>Mon, 07 Sep 2020 10:28:00 +0200</pubDate><guid>https://hugocisneros.com/notes/dijkstra_s_algorithm/</guid><description> tags Applied maths, Algorithm</description></item><item><title>Article: Open-endedness: The last grand challenge you’ve never heard of</title><link>https://hugocisneros.com/notes/open_endedness_the_last_grand_challenge_you_ve_never_heard_of/</link><pubDate>Mon, 24 Aug 2020 12:09:00 +0200</pubDate><guid>https://hugocisneros.com/notes/open_endedness_the_last_grand_challenge_you_ve_never_heard_of/</guid><description> authors Kenneth Stanley tags Artificial Intelligence, Open-ended Evolution source Link</description></item><item><title>Kenneth Stanley</title><link>https://hugocisneros.com/notes/kenneth_stanley/</link><pubDate>Mon, 24 Aug 2020 12:04:00 +0200</pubDate><guid>https://hugocisneros.com/notes/kenneth_stanley/</guid><description>Ken Stanley is a researcher at OpenAI.</description></item><item><title>Crosshatch automata</title><link>https://hugocisneros.com/notes/crosshatch_automata/</link><pubDate>Mon, 24 Aug 2020 10:33:00 +0200</pubDate><guid>https://hugocisneros.com/notes/crosshatch_automata/</guid><description> resources Medium article tags Cellular automata</description></item><item><title>Article: The Cartoon Picture of Magnets That Has Transformed Science</title><link>https://hugocisneros.com/notes/the_cartoon_picture_of_magnets_that_has_transformed_science/</link><pubDate>Mon, 24 Aug 2020 09:18:00 +0200</pubDate><guid>https://hugocisneros.com/notes/the_cartoon_picture_of_magnets_that_has_transformed_science/</guid><description>tags Ising model, Complex Systems source Quanta magazine The Ising model is an example of very simply defined model that makes complex behavior emerge.
Originally introduced by Wilhelm Lenz and his graduate student Ernst Ising, its purpose was to understand why magnets lose their attractive power when heated past a certain temperature. The model was first tried in 1D, where it fails to show that a magnet stays magnetized, and therefore abandoned.</description></item><item><title>Article: The End of the RNA World Is Near, Biochemists Argue</title><link>https://hugocisneros.com/notes/the_end_of_the_rna_world_is_near_biochemists_argue/</link><pubDate>Mon, 24 Aug 2020 09:17:00 +0200</pubDate><guid>https://hugocisneros.com/notes/the_end_of_the_rna_world_is_near_biochemists_argue/</guid><description>source https://www.quantamagazine.org/the-end-of-the-rna-world-is-near-biochemists-argue-20171219/ tags Biological life This article is about alternatives to the dominant RNA-world theories.
Objections to RNA:
Crucial processes that we consider part of life could not have been carried out by a single polymer, and particularly not RNA. This is because these chemical reactions have rates ranging across 20 orders of magnitude. RNA cannot explain the emergence of genetic code. It would have been too long for RNA alone to find the mapping rules from 64 three nucleotide sequences to 20 amino acids.</description></item><item><title>Article: What Is an Individual? Biology Seeks Clues in Information Theory.</title><link>https://hugocisneros.com/notes/what_is_an_individual_biology_seeks_clues_in_information_theory/</link><pubDate>Mon, 24 Aug 2020 09:17:00 +0200</pubDate><guid>https://hugocisneros.com/notes/what_is_an_individual_biology_seeks_clues_in_information_theory/</guid><description>tags Life resources (Krakauer et al. 2020) source Quanta Magazine “In a way, [biology] is a science of individuality,” said Melanie Mitchell, a computer scientist at the Santa Fe Institute.
And yet, the notion of what it means to be an individual often gets glossed over. “So far we have a concept of ‘individual’ that’s very much like the concept of ‘pile,’” said Maxwell Ramstead, a postdoctoral researcher at McGill University.</description></item><item><title>Melanie Mitchell</title><link>https://hugocisneros.com/notes/melanie_mitchell/</link><pubDate>Mon, 24 Aug 2020 08:59:00 +0200</pubDate><guid>https://hugocisneros.com/notes/melanie_mitchell/</guid><description>resources Website She has worked at the Santa Fe Institute and studies Complex Systems, Artificial Intelligence.</description></item><item><title>Statistical complexity</title><link>https://hugocisneros.com/notes/statistical_complexity/</link><pubDate>Wed, 29 Jul 2020 14:24:00 +0200</pubDate><guid>https://hugocisneros.com/notes/statistical_complexity/</guid><description>tags Complexity metrics papers (Crutchfield, Young 1989) One interpretation of the statistical complexity is that it is the minimum amount of historical information required to make optimal forecasts of bits in \(x\) at the error rate \(h_\mu\).
For periodic sequences, \(C_\mu(x) = 0\) and for ideal random sequences \(C_\mu(x) = 0\) too.
Several researchers have tried to capture the properties of statistical complexity with practical alternatives. The resulting complexity metrics include:</description></item><item><title>Talk: Alife 2020 keynote Lee Cronin - A Top Down Chemically Embodied Artificial Life Computation</title><link>https://hugocisneros.com/notes/talk_alife_2020_keynote_lee_cronin_a_top_down_chemically_embodied_artificial_life_computation/</link><pubDate>Wed, 29 Jul 2020 14:24:00 +0200</pubDate><guid>https://hugocisneros.com/notes/talk_alife_2020_keynote_lee_cronin_a_top_down_chemically_embodied_artificial_life_computation/</guid><description>tags Life, ALife 2020 Complex molecules are bio-signatures, they are the sign of complex (evolutionary?) processes that have been going on.
Assembly theory Exploring complexity: Lee is showing some theoretical idea about a complexity metric. Like many other metrics, he starts from the observation that neither entropy nor Kolmogorov complexity are suitable for considering the history of an object.
Instead of thinking in terms of disorder or complexity, why not ask simply about &amp;ldquo;how has this object been assembled?</description></item><item><title>Urban science</title><link>https://hugocisneros.com/notes/urban_science/</link><pubDate>Wed, 29 Jul 2020 10:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/urban_science/</guid><description/></item><item><title>Fractional calculus</title><link>https://hugocisneros.com/notes/fractional_calculus/</link><pubDate>Tue, 28 Jul 2020 17:29:00 +0200</pubDate><guid>https://hugocisneros.com/notes/fractional_calculus/</guid><description> tags Mathematics resources Wikipedia</description></item><item><title>Program synthesis</title><link>https://hugocisneros.com/notes/program_synthesis/</link><pubDate>Mon, 27 Jul 2020 15:28:00 +0200</pubDate><guid>https://hugocisneros.com/notes/program_synthesis/</guid><description>tags Computer science, Coding Program synthesis is the task of writing programs automatically for a given tasks. This is widely considered a very hard problem in the general case, as the computational languages we manipulate as human are hard to manipulate &amp;ldquo;smoothly&amp;rdquo;.
Compilation is a type of program synthesis where both the source language and the target language are well defined. A compiler is written for a given source language/target language pair and is therefore doing well specified program synthesis.</description></item><item><title>Epistasis</title><link>https://hugocisneros.com/notes/epistasis/</link><pubDate>Mon, 27 Jul 2020 15:14:00 +0200</pubDate><guid>https://hugocisneros.com/notes/epistasis/</guid><description>Epistasis is about interactions between mutations in an evolving systems.
No epistasis corresponds to mutation effects &amp;ldquo;stacking&amp;rdquo; without any particular kind of interaction. Positive epistasis happens when the combined effect of the two mutations is more positive than the sum of their contributions. Negative epistasis is the same principle with negative effects. Cancer is an example negative epistasis where the addition of a lot of mutations is needed to obtain a cancerous cell.</description></item><item><title>Noise</title><link>https://hugocisneros.com/notes/noise/</link><pubDate>Mon, 27 Jul 2020 14:03:00 +0200</pubDate><guid>https://hugocisneros.com/notes/noise/</guid><description> tags Statistics, Applied maths</description></item><item><title>MAP-Elites</title><link>https://hugocisneros.com/notes/map_elites/</link><pubDate>Mon, 27 Jul 2020 13:24:00 +0200</pubDate><guid>https://hugocisneros.com/notes/map_elites/</guid><description>tags Quality diversity, Reinforcement learning papers (Mouret, Clune 2015; Cully et al. 2015) MAP-Elites are an example of QD algorithm. The behavior space is discretized in cells and during exploration, only the best &amp;ldquo;elite&amp;rdquo; for each cell is kept.
Individuals are added to the grid if they:
fill an empty space are better than an existing elite Bibliography Jean-Baptiste Mouret, Jeff Clune. April 19, 2015. "Illuminating Search Spaces by Mapping Elites"</description></item><item><title>NK model</title><link>https://hugocisneros.com/notes/nk_model/</link><pubDate>Sun, 26 Jul 2020 19:57:00 +0200</pubDate><guid>https://hugocisneros.com/notes/nk_model/</guid><description> tags Complex Systems</description></item><item><title>Co-evolution</title><link>https://hugocisneros.com/notes/co_evolution/</link><pubDate>Sun, 26 Jul 2020 19:21:00 +0200</pubDate><guid>https://hugocisneros.com/notes/co_evolution/</guid><description> tags Evolution</description></item><item><title>Assembly language</title><link>https://hugocisneros.com/notes/assembly_language/</link><pubDate>Sun, 26 Jul 2020 19:09:00 +0200</pubDate><guid>https://hugocisneros.com/notes/assembly_language/</guid><description> tags Programming languages</description></item><item><title>Federated learning</title><link>https://hugocisneros.com/notes/federated_learning/</link><pubDate>Sun, 26 Jul 2020 17:37:00 +0200</pubDate><guid>https://hugocisneros.com/notes/federated_learning/</guid><description> tags Machine learning</description></item><item><title>Langton's loop</title><link>https://hugocisneros.com/notes/langton_s_loop/</link><pubDate>Sat, 25 Jul 2020 22:35:00 +0200</pubDate><guid>https://hugocisneros.com/notes/langton_s_loop/</guid><description> tags Christopher Langton, Cellular automata</description></item><item><title>Programming languages</title><link>https://hugocisneros.com/notes/programming_languages/</link><pubDate>Wed, 22 Jul 2020 15:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/programming_languages/</guid><description> tags Computer science PL I use or have used:
Python C Programming language C++ Javascript Rust Scala Java Ruby ELisp Haskell</description></item><item><title>Functional programming</title><link>https://hugocisneros.com/notes/functional_programming/</link><pubDate>Wed, 22 Jul 2020 10:15:00 +0200</pubDate><guid>https://hugocisneros.com/notes/functional_programming/</guid><description> tags Computer science, Coding Example of functional programming languages Lisp Haskell</description></item><item><title>Haskell</title><link>https://hugocisneros.com/notes/haskell/</link><pubDate>Wed, 22 Jul 2020 10:14:00 +0200</pubDate><guid>https://hugocisneros.com/notes/haskell/</guid><description> tags Programming languages, Coding</description></item><item><title>Adaptive Computation Time</title><link>https://hugocisneros.com/notes/adaptive_computation_time/</link><pubDate>Tue, 21 Jul 2020 08:54:00 +0200</pubDate><guid>https://hugocisneros.com/notes/adaptive_computation_time/</guid><description>tags Neural networks, Algorithm Adaptive computation time (ACT) was introduced in (Graves 2017) as a way to make computations in RNN adaptive. The network learns how many computational steps to use before emitting an output.
This is done by outputting an extra halting probability at each update step, and considering two timelines:
the input timeline which plays the role of an outer loop, at each of those step, a new input symbol is fed to the RNN.</description></item><item><title>Autopoiesis</title><link>https://hugocisneros.com/notes/autopoiesis/</link><pubDate>Mon, 20 Jul 2020 21:35:00 +0200</pubDate><guid>https://hugocisneros.com/notes/autopoiesis/</guid><description> tags Life</description></item><item><title>Tierra</title><link>https://hugocisneros.com/notes/tierra/</link><pubDate>Mon, 20 Jul 2020 13:48:00 +0200</pubDate><guid>https://hugocisneros.com/notes/tierra/</guid><description> tags Artificial life, Evolution</description></item><item><title>Santa Fe Institute</title><link>https://hugocisneros.com/notes/santa_fe_institute/</link><pubDate>Sun, 19 Jul 2020 22:31:00 +0200</pubDate><guid>https://hugocisneros.com/notes/santa_fe_institute/</guid><description> tags Complex Systems, Physics</description></item><item><title>Simpson's paradox</title><link>https://hugocisneros.com/notes/simpson_s_paradox/</link><pubDate>Sun, 19 Jul 2020 22:10:00 +0200</pubDate><guid>https://hugocisneros.com/notes/simpson_s_paradox/</guid><description> tags Statistics</description></item><item><title>The Simulated reality hypothesis</title><link>https://hugocisneros.com/notes/the_simulated_reality_hypothesis/</link><pubDate>Sun, 19 Jul 2020 21:27:00 +0200</pubDate><guid>https://hugocisneros.com/notes/the_simulated_reality_hypothesis/</guid><description>tags Philosophy The simulation argument Nick Bostrom proposed a trilemma in 2003:
&amp;ldquo;The fraction of human-level civilizations that reach a posthuman stage (that is, one capable of running high-fidelity ancestor simulations) is very close to zero&amp;rdquo;, or &amp;ldquo;The fraction of posthuman civilizations that are interested in running simulations of their evolutionary history, or variations thereof, is very close to zero&amp;rdquo;, or &amp;ldquo;The fraction of all people with our kind of experiences that are living in a simulation is very close to one.</description></item><item><title>Nick Bostrom</title><link>https://hugocisneros.com/notes/nick_bostrom/</link><pubDate>Sun, 19 Jul 2020 21:25:00 +0200</pubDate><guid>https://hugocisneros.com/notes/nick_bostrom/</guid><description> tags Philosophy</description></item><item><title>Bongard problems</title><link>https://hugocisneros.com/notes/bongard_problems/</link><pubDate>Fri, 17 Jul 2020 13:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/bongard_problems/</guid><description> tags Artificial intelligence test</description></item><item><title>Evaluating NLP</title><link>https://hugocisneros.com/notes/evaluating_nlp/</link><pubDate>Fri, 17 Jul 2020 13:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/evaluating_nlp/</guid><description>tags Natural language processing Language model evaluation Perplexity For a given word sequence \(\mathbf{w} = (w_1, &amp;hellip;, w_n)\), perplexity (PPL) is defined \[ PPL = 2^{-\frac{1}{n} \sum_{i=1}^n \log_2 P(w_i | w_{i-1} &amp;hellip; w_1 )} \] It can be seen as the cross-entropy between an empirical distribution of test words and the predicted conditional word distribution. A language model that would encode each word with an average 8 bits has a perplexity of 256 (\(2^8\)).</description></item><item><title>Abstraction and Reasoning Corpus</title><link>https://hugocisneros.com/notes/abstraction_and_reasoning_corpus/</link><pubDate>Fri, 17 Jul 2020 13:44:00 +0200</pubDate><guid>https://hugocisneros.com/notes/abstraction_and_reasoning_corpus/</guid><description> tags Artificial intelligence test</description></item><item><title>Algorithmic probability</title><link>https://hugocisneros.com/notes/algorithmic_probability/</link><pubDate>Tue, 14 Jul 2020 08:34:00 +0200</pubDate><guid>https://hugocisneros.com/notes/algorithmic_probability/</guid><description> tags Complexity, Algorithmic Information theory</description></item><item><title>Automated discovery in complex systems</title><link>https://hugocisneros.com/notes/automated_discovery_in_complex_systems/</link><pubDate>Tue, 14 Jul 2020 08:33:00 +0200</pubDate><guid>https://hugocisneros.com/notes/automated_discovery_in_complex_systems/</guid><description>tags Complex Systems Evolutionary algorithms and CAs Evolutionary algorithms have been used to find Cellular automata rules with specific behavior (Mitchell et al. 1996; Sapin et al. 2003) . The objective is to optimize a fitness function (majority of cells, presence of gliders and periodic patterns, etc.).
Bibliography Melanie Mitchell, James P Crutchfield, Rajarshi Das. 1996. "Evolving Cellular Automata with Genetic Algorithms: A Review of Recent Work". In Proceedings of the First International Conference on Evolutionary Computation and Its Applications, 14.</description></item><item><title>Backward RNN</title><link>https://hugocisneros.com/notes/backward_rnn/</link><pubDate>Tue, 14 Jul 2020 08:33:00 +0200</pubDate><guid>https://hugocisneros.com/notes/backward_rnn/</guid><description>tags Recurrent neural networks Regular RNNs process input in sequence. When applied to a language modeling task, one tries to predict a word given the previous ones. For example, with the sentence The quick brown fox jumps over the lazy, a classical RNN will initialize and internal state \(s_0\) and process each word in sequence, starting from The and updating its internal state with each new word in order to make a final prediction.</description></item><item><title>Berry's paradox</title><link>https://hugocisneros.com/notes/berry_s_paradox/</link><pubDate>Tue, 14 Jul 2020 08:33:00 +0200</pubDate><guid>https://hugocisneros.com/notes/berry_s_paradox/</guid><description>Berry&amp;rsquo;s paradox is a sentence of the form &amp;ldquo;The smallest positive integer not definable in under sixty letters&amp;rdquo; (a phrase with fifty-seven letters).
An argument very similar to Berry&amp;rsquo;s paradox is used in the proof of uncomputability of Kolmogorov complexity.
Resolution An interesting study and resolution of Berry&amp;rsquo;s paradox</description></item><item><title>Causal inference</title><link>https://hugocisneros.com/notes/causal_inference/</link><pubDate>Tue, 14 Jul 2020 08:33:00 +0200</pubDate><guid>https://hugocisneros.com/notes/causal_inference/</guid><description> tags Statistics</description></item><item><title>Computability theory</title><link>https://hugocisneros.com/notes/computability_theory/</link><pubDate>Tue, 14 Jul 2020 08:31:00 +0200</pubDate><guid>https://hugocisneros.com/notes/computability_theory/</guid><description/></item><item><title>Edge detection</title><link>https://hugocisneros.com/notes/edge_detection/</link><pubDate>Tue, 14 Jul 2020 08:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/edge_detection/</guid><description>tags Image processing Canny edge detection Canny edge detection in the most famous edge detection algorithm, originally developed by John Canny in 1986.
The algorithm has 5 steps:
Smooth the image with Gaussian filtering. Intensity gradients. First derivative in the horizontal (\(\mathbf{G}_x\)) and vertical (\(\mathbf{G}_y\)) directions are computed. Gradient intensity \(\mathbf{G} = \sqrt{\mathbf{G}_x^2 + \mathbf{G}_y^2}\) and direction \(\mathbf{\Theta} = \text{atan2}(\mathbf{G}_y, \mathbf{G}_x)\) are then computed. Edge thinning to reduce blurring from the first two steps.</description></item><item><title>Effective measure complexity</title><link>https://hugocisneros.com/notes/effective_measure_complexity/</link><pubDate>Tue, 14 Jul 2020 08:30:00 +0200</pubDate><guid>https://hugocisneros.com/notes/effective_measure_complexity/</guid><description> tags Complexity metrics</description></item><item><title>ELisp</title><link>https://hugocisneros.com/notes/elisp/</link><pubDate>Tue, 14 Jul 2020 08:29:00 +0200</pubDate><guid>https://hugocisneros.com/notes/elisp/</guid><description>ELisp is a dialect of the Lisp programming language.</description></item><item><title>Gaussian Processes</title><link>https://hugocisneros.com/notes/gaussian_processes/</link><pubDate>Tue, 14 Jul 2020 08:28:00 +0200</pubDate><guid>https://hugocisneros.com/notes/gaussian_processes/</guid><description> tags Machine learning resources K. Bailey&amp;rsquo;s blog post</description></item><item><title>Halting problem</title><link>https://hugocisneros.com/notes/halting_problem/</link><pubDate>Tue, 14 Jul 2020 08:27:00 +0200</pubDate><guid>https://hugocisneros.com/notes/halting_problem/</guid><description> tags Computability theory</description></item><item><title>Haskell Curry</title><link>https://hugocisneros.com/notes/haskell_curry/</link><pubDate>Tue, 14 Jul 2020 08:27:00 +0200</pubDate><guid>https://hugocisneros.com/notes/haskell_curry/</guid><description/></item><item><title>Information theory</title><link>https://hugocisneros.com/notes/information_theory/</link><pubDate>Tue, 14 Jul 2020 08:27:00 +0200</pubDate><guid>https://hugocisneros.com/notes/information_theory/</guid><description/></item><item><title>Java</title><link>https://hugocisneros.com/notes/java/</link><pubDate>Tue, 14 Jul 2020 08:27:00 +0200</pubDate><guid>https://hugocisneros.com/notes/java/</guid><description> tags Programming languages, Coding</description></item><item><title>Javascript</title><link>https://hugocisneros.com/notes/javascript/</link><pubDate>Tue, 14 Jul 2020 08:27:00 +0200</pubDate><guid>https://hugocisneros.com/notes/javascript/</guid><description> tags Programming languages, Coding</description></item><item><title>Jevons paradox</title><link>https://hugocisneros.com/notes/jevons_paradox/</link><pubDate>Tue, 14 Jul 2020 08:26:00 +0200</pubDate><guid>https://hugocisneros.com/notes/jevons_paradox/</guid><description>tags Economics, Climate resources (York, McGee 2016; Polimeni et al. 2015), Wikipedia, Real climate economics blog posts (Jim Barrett) Definition Jevons Paradox is used to describe the situation where an increase in resource efficiency triggered by technological innovation has the counter-intuitive effect of raising the demand and increasing the overall consumption.
It was first described in W. S. Jenvons&amp;rsquo; book The Coal question in 1865.
It is closely to another paradox well known in road planning (Downs–Thomson paradox) and Wirth&amp;rsquo;s law in software engineering.</description></item><item><title>John Von Neumann</title><link>https://hugocisneros.com/notes/john_von_neumann/</link><pubDate>Tue, 14 Jul 2020 08:26:00 +0200</pubDate><guid>https://hugocisneros.com/notes/john_von_neumann/</guid><description/></item><item><title>Kaya identity</title><link>https://hugocisneros.com/notes/kaya_identity/</link><pubDate>Tue, 14 Jul 2020 08:24:00 +0200</pubDate><guid>https://hugocisneros.com/notes/kaya_identity/</guid><description>tags Climate Definition It was developed by Japanese economist Yoichi Kaya.
\(F\) is global CO2 emissions from human sources, \(P\) is global population, \(G\) is GPD, \(E\) is global energy consumption. \[ F = P \times \frac{G}{P} \times \frac{E}{G} \times \frac{F}{E} \]
The fractional terms correspond to well studied quantities:
\(G/P\) is the GDP per capita \(E/G\) is the energy intensity of the GDP \(F/E\) is the carbon footprint of energy Interpretation This identity is simply a rewrite of \(F=F\) in terms of commonly used quantities to highlight several levers one could act on to reduce CO2 emissions.</description></item><item><title>Konrad Zuse</title><link>https://hugocisneros.com/notes/konrad_zuse/</link><pubDate>Tue, 14 Jul 2020 08:24:00 +0200</pubDate><guid>https://hugocisneros.com/notes/konrad_zuse/</guid><description>resources Juergen Schmidhuber&amp;rsquo;s page In 1941, he constructed the first fully functional programmable computer, the Z3.
He suggested in 1967 in his book Calculating space that the universe is running on a Cellular automaton. This is now known as Zuse&amp;rsquo;s thesis.</description></item><item><title>Language</title><link>https://hugocisneros.com/notes/language/</link><pubDate>Tue, 14 Jul 2020 08:22:00 +0200</pubDate><guid>https://hugocisneros.com/notes/language/</guid><description/></item><item><title>Logic</title><link>https://hugocisneros.com/notes/logic/</link><pubDate>Tue, 14 Jul 2020 08:22:00 +0200</pubDate><guid>https://hugocisneros.com/notes/logic/</guid><description/></item><item><title>Logical depth</title><link>https://hugocisneros.com/notes/logical_depth/</link><pubDate>Tue, 14 Jul 2020 08:21:00 +0200</pubDate><guid>https://hugocisneros.com/notes/logical_depth/</guid><description>tags Complexity metrics references (Bennett 1995) Logical depth can be defined as the run time of the Turing Machine that uses the minimal representation for an input \(x\), \(M_{\min}(x)\) &amp;mdash; which is also its Kolmogorov complexity . It is therefore uncomputable (because the minimal representation is uncomputable).
Bibliography Charles H. Bennett. 1995. "Logical Depth and Physical Complexity". In The Universal Turing Machine A Half-Century Survey, edited by Rolf Herken, 2:207–35.</description></item><item><title>Mathematics</title><link>https://hugocisneros.com/notes/mathematics/</link><pubDate>Tue, 14 Jul 2020 08:21:00 +0200</pubDate><guid>https://hugocisneros.com/notes/mathematics/</guid><description/></item><item><title>Mean field theory of neural networks (talk)</title><link>https://hugocisneros.com/notes/mean_field_theory_of_neural_networks/</link><pubDate>Tue, 14 Jul 2020 08:21:00 +0200</pubDate><guid>https://hugocisneros.com/notes/mean_field_theory_of_neural_networks/</guid><description>speaker Andrea Montanari tags Neural networks Two layers Neural nets to Wasserstein gradient flows Classical Supervised learning setting
**</description></item><item><title>Morphogenesis</title><link>https://hugocisneros.com/notes/morphogenesis/</link><pubDate>Tue, 14 Jul 2020 08:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/morphogenesis/</guid><description> tags Biological life, Physics</description></item><item><title>Ontogeny recapitulates phylogeny</title><link>https://hugocisneros.com/notes/ontogeny_recapitulates_phylogeny/</link><pubDate>Mon, 13 Jul 2020 18:40:00 +0200</pubDate><guid>https://hugocisneros.com/notes/ontogeny_recapitulates_phylogeny/</guid><description>tags Evolution, Biological life link Wikipedia This is a generalization principle in biology stating that stages of development of an organism often resemble some of its ancestors.</description></item><item><title>Public key encryption</title><link>https://hugocisneros.com/notes/public_key_encryption/</link><pubDate>Mon, 13 Jul 2020 18:39:00 +0200</pubDate><guid>https://hugocisneros.com/notes/public_key_encryption/</guid><description> tags Cryptography RSA Diffie-Hellman Elliptic curve cryptography</description></item><item><title>Autoencoders</title><link>https://hugocisneros.com/notes/autoencoders/</link><pubDate>Mon, 13 Jul 2020 10:19:00 +0200</pubDate><guid>https://hugocisneros.com/notes/autoencoders/</guid><description>tags Neural networks, Data representation Autoencoders and PCA nn The relation between Autoencoders and PCA is strong. In particular, a very small autoencoder with only linear activations seems intuitively very close to PCA decomposition. (Bourlard, Kamp 1988) gives an interesting analysis of the uselessness of the activation functions in the encoding layers of an autoencoder when there is no activations in the output layers. In that case, autoencoding is closely related to a sinigular value decomposition of the input data.</description></item><item><title>Lisp</title><link>https://hugocisneros.com/notes/lisp/</link><pubDate>Fri, 10 Jul 2020 11:17:00 +0200</pubDate><guid>https://hugocisneros.com/notes/lisp/</guid><description>tags Programming languages Lisp has been a popular set of language for Artificial Intelligence research, from the 1970s to the 1990s.</description></item><item><title>Reaction-diffusion</title><link>https://hugocisneros.com/notes/reaction_diffusion/</link><pubDate>Fri, 10 Jul 2020 10:05:00 +0200</pubDate><guid>https://hugocisneros.com/notes/reaction_diffusion/</guid><description> tags Physics, Morphogenesis</description></item><item><title>Theory of computation</title><link>https://hugocisneros.com/notes/theory_of_computation/</link><pubDate>Fri, 10 Jul 2020 09:15:00 +0200</pubDate><guid>https://hugocisneros.com/notes/theory_of_computation/</guid><description> tags Computer science</description></item><item><title>Algorithm</title><link>https://hugocisneros.com/notes/algorithm/</link><pubDate>Fri, 10 Jul 2020 09:13:00 +0200</pubDate><guid>https://hugocisneros.com/notes/algorithm/</guid><description> tags Computer science, Coding</description></item><item><title>Computer science</title><link>https://hugocisneros.com/notes/computer_science/</link><pubDate>Fri, 10 Jul 2020 09:12:00 +0200</pubDate><guid>https://hugocisneros.com/notes/computer_science/</guid><description/></item><item><title>Rice’s theorem</title><link>https://hugocisneros.com/notes/rice_s_theorem/</link><pubDate>Thu, 09 Jul 2020 14:31:00 +0200</pubDate><guid>https://hugocisneros.com/notes/rice_s_theorem/</guid><description/></item><item><title>RNA-world</title><link>https://hugocisneros.com/notes/rna_world/</link><pubDate>Thu, 09 Jul 2020 14:31:00 +0200</pubDate><guid>https://hugocisneros.com/notes/rna_world/</guid><description> tags Biological life</description></item><item><title>SIR model</title><link>https://hugocisneros.com/notes/sir_model/</link><pubDate>Thu, 09 Jul 2020 14:29:00 +0200</pubDate><guid>https://hugocisneros.com/notes/sir_model/</guid><description>tags Applied maths resources Wikipedia Simplest form The SIR model is defined for a population \(N\), \(S\) the number of susceptible persons, \(I\) the number of infected people and \(R\) the number of poeple who have recovered. The following system of differential equations governs the evolution of those three variables:
\[ \frac{dS}{dt} = - \frac{\beta I S}{N} \] \[ \frac{dI}{dt} = \frac{\beta I S }{N}- \gamma I \] \[ \frac{dR}{dt} = \gamma I \]</description></item><item><title>Statistical physics</title><link>https://hugocisneros.com/notes/statistical_physics/</link><pubDate>Thu, 09 Jul 2020 14:29:00 +0200</pubDate><guid>https://hugocisneros.com/notes/statistical_physics/</guid><description> tags Physics, Statistics</description></item><item><title>Statistics</title><link>https://hugocisneros.com/notes/statistics/</link><pubDate>Thu, 09 Jul 2020 14:28:00 +0200</pubDate><guid>https://hugocisneros.com/notes/statistics/</guid><description> tags Applied maths</description></item><item><title>C Programming language</title><link>https://hugocisneros.com/notes/c_programming_language/</link><pubDate>Thu, 09 Jul 2020 12:44:00 +0200</pubDate><guid>https://hugocisneros.com/notes/c_programming_language/</guid><description/></item><item><title>C++</title><link>https://hugocisneros.com/notes/c/</link><pubDate>Thu, 09 Jul 2020 12:44:00 +0200</pubDate><guid>https://hugocisneros.com/notes/c/</guid><description> tags Programming languages, Coding</description></item><item><title>Surprisingly Turing-Complete</title><link>https://hugocisneros.com/notes/surprisingly_turing_complete/</link><pubDate>Thu, 09 Jul 2020 12:43:00 +0200</pubDate><guid>https://hugocisneros.com/notes/surprisingly_turing_complete/</guid><description>tags Turing-completeness source Gwern Branwen&amp;rsquo;s website Turing-completeness is common TC [Turing-completeness], [&amp;hellip;] is [&amp;hellip;] weirdly common: one might think that such universality as a system being smart enough to be able to run any program might be difficult or hard to achieve, but it turns out to be the opposite and it is difficult to write a useful system which does not immediately tip over into TC.
I&amp;rsquo;ve often been amazed at how common TC can be in sufficiently complicated systems.</description></item><item><title>Symmetric encryption</title><link>https://hugocisneros.com/notes/symmetric_encryption/</link><pubDate>Thu, 09 Jul 2020 12:43:00 +0200</pubDate><guid>https://hugocisneros.com/notes/symmetric_encryption/</guid><description> tags Cryptography</description></item><item><title>Talk: Artificial Intelligence: A Guide for Thinking Humans</title><link>https://hugocisneros.com/notes/talk_artificial_intelligence_a_guide_for_thinking_humans/</link><pubDate>Thu, 09 Jul 2020 12:43:00 +0200</pubDate><guid>https://hugocisneros.com/notes/talk_artificial_intelligence_a_guide_for_thinking_humans/</guid><description>presenter Melanie Mitchell source Youtube Talk at the Santa Fe Institute on Nov 13, 2019.
What is Artificial Intelligence? Many different things fall under the name AI (self-driving cars, chess playing machines, image classifier, video game AIs, etc.).
[Building] machines that perform tasks normally requiring human intelligence. &amp;mdash; Nils Nilsson, 1971
Chess was thought to be the pinnacle of intelligence until a brute-force approach was found to beat any human intelligent approach.</description></item><item><title>Talk: Differentiation of black-box combinatorial solvers</title><link>https://hugocisneros.com/notes/talk_differentiation_of_black_box_combinatorial_solvers/</link><pubDate>Thu, 09 Jul 2020 12:43:00 +0200</pubDate><guid>https://hugocisneros.com/notes/talk_differentiation_of_black_box_combinatorial_solvers/</guid><description>presenter Michal Rolinek tags Combinatorics, Machine learning The goal is to merge combinatorial optimization and deep learning.
Make use of strong battle tested optimization methods. Some of those can find almost-optimal solutions to NP-hard problems in ~quadratic time.
Goal is to cover many combinatorial problems, TSP multi-cut, etc.
fast backward pass theoretically sound easy to use But the goal is not to take a combinatorial problem but just relax it to make it differentiable, because there is often a huge price to pay for this.</description></item><item><title>The Lottery ticket hypothesis</title><link>https://hugocisneros.com/notes/the_lottery_ticket_hypothesis/</link><pubDate>Thu, 09 Jul 2020 12:42:00 +0200</pubDate><guid>https://hugocisneros.com/notes/the_lottery_ticket_hypothesis/</guid><description>tags Neural network training resources The AI podcast papers (Frankle, Carbin 2018) When training very large neural networks, the obtained net might have a lot of unused neurons. It is possible, through neural network pruning, to remove a lot of those unused connections to make the overall architecture lighter and faster to run on some hardware.
However, once you have the pruned architecture, it will often not be able to learn anything interesting when it is trained from scratch.</description></item><item><title>Decentralization</title><link>https://hugocisneros.com/notes/decentralization/</link><pubDate>Thu, 09 Jul 2020 11:12:00 +0200</pubDate><guid>https://hugocisneros.com/notes/decentralization/</guid><description> tags Economics</description></item><item><title>Physics</title><link>https://hugocisneros.com/notes/physics/</link><pubDate>Thu, 02 Jul 2020 10:23:00 +0200</pubDate><guid>https://hugocisneros.com/notes/physics/</guid><description/></item><item><title>Data representation</title><link>https://hugocisneros.com/notes/data_representation/</link><pubDate>Thu, 02 Jul 2020 10:22:00 +0200</pubDate><guid>https://hugocisneros.com/notes/data_representation/</guid><description>tags Machine learning Data representation is about finding compact representation of high dimensional data (such as images, videos, 3D shapes, etc.)
Several methods have been developed for this purpose such as PCA, Neural networks-based representation, Autoencoders.</description></item><item><title>Evolution</title><link>https://hugocisneros.com/notes/evolution/</link><pubDate>Thu, 02 Jul 2020 10:22:00 +0200</pubDate><guid>https://hugocisneros.com/notes/evolution/</guid><description> tags Artificial life, Life</description></item><item><title>Downs–Thomson paradox</title><link>https://hugocisneros.com/notes/downs_thomson_paradox/</link><pubDate>Thu, 02 Jul 2020 10:21:00 +0200</pubDate><guid>https://hugocisneros.com/notes/downs_thomson_paradox/</guid><description/></item><item><title>Economic liberalism</title><link>https://hugocisneros.com/notes/economic_liberalism/</link><pubDate>Thu, 02 Jul 2020 10:21:00 +0200</pubDate><guid>https://hugocisneros.com/notes/economic_liberalism/</guid><description>tags Economics Definition The weekly newspaper The Economist is often described as having economic liberalism among its political alignment.
Decentralization, globalization and economic liberalism Many economic liberalism advocates consider that economic decision should follow some natural tendencies. This, to me, is related to some energy minimization principles where letting everything go normally should lead to the optimal configuration.
In the case of Decentralization, an almost immediate effect is the decrease of prices of some common goods which usually make people happier.</description></item><item><title>Boltzmann brain</title><link>https://hugocisneros.com/notes/boltzmann_brain/</link><pubDate>Thu, 02 Jul 2020 10:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/boltzmann_brain/</guid><description>tags Physics, Statistical physics Boltzmann brain thoughts Boltzmann brain is an interesting concept offered initially in response to one of Ludwig Boltzmann&amp;rsquo;s explanation for the low-entropy state of the Universe. He hypothesized that even a fully random universe would fluctuate towards lower-entropy states. The issue is that many phenomena such as evolved life on Earth are so far from equilibrium it looks like they were extremely unlikely to have happened.</description></item><item><title>Combinatorics</title><link>https://hugocisneros.com/notes/combinatorics/</link><pubDate>Thu, 02 Jul 2020 10:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/combinatorics/</guid><description> tags Mathematics</description></item><item><title>John Conway</title><link>https://hugocisneros.com/notes/john_conway/</link><pubDate>Thu, 02 Jul 2020 10:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/john_conway/</guid><description> tags Mathematics</description></item><item><title>Marvin Minsky</title><link>https://hugocisneros.com/notes/marvin_minsky/</link><pubDate>Thu, 02 Jul 2020 10:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/marvin_minsky/</guid><description/></item><item><title>Neural network pruning</title><link>https://hugocisneros.com/notes/neural_network_pruning/</link><pubDate>Thu, 02 Jul 2020 10:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/neural_network_pruning/</guid><description>tags Neural networks papers (LeCun et al. 1990; Hassibi, Stork 1993; Han et al. 2015; Li et al. 2016) Bibliography Yann LeCun, John S. Denker, Sara A. Solla. 1990. "Optimal Brain Damage". In Advances in Neural Information Processing Systems, 598–605. Babak Hassibi, David G. Stork. 1993. "Second Order Derivatives for Network Pruning: Optimal Brain Surgeon". In Advances in Neural Information Processing Systems, 164–71. Song Han, Jeff Pool, John Tran, William Dally.</description></item><item><title>Ruby</title><link>https://hugocisneros.com/notes/ruby/</link><pubDate>Thu, 02 Jul 2020 10:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/ruby/</guid><description> tags Programming languages, Coding</description></item><item><title>Scala</title><link>https://hugocisneros.com/notes/scala/</link><pubDate>Thu, 02 Jul 2020 10:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/scala/</guid><description> tags Programming languages, Coding</description></item><item><title>Abelian sandpile model</title><link>https://hugocisneros.com/notes/abelian_sandpile_model/</link><pubDate>Thu, 02 Jul 2020 10:09:00 +0200</pubDate><guid>https://hugocisneros.com/notes/abelian_sandpile_model/</guid><description> tags Cellular automata resources Wikipedia</description></item><item><title>Hyperbolic geometry</title><link>https://hugocisneros.com/notes/hyperbolic_geometry/</link><pubDate>Thu, 02 Jul 2020 09:46:00 +0200</pubDate><guid>https://hugocisneros.com/notes/hyperbolic_geometry/</guid><description> tags Mathematics</description></item><item><title>Chaos</title><link>https://hugocisneros.com/notes/chaos/</link><pubDate>Thu, 02 Jul 2020 08:45:00 +0200</pubDate><guid>https://hugocisneros.com/notes/chaos/</guid><description>tags Physics Chaos is a striking example of emergence. Deterministic equations of motions lead to completely unpredictable over time. Randomness has emerged from these deterministic laws.
From (Crutchfield 1994):
Where in the determinism did the randomness come from? The answer is that the effective dynamic, which maps from initial conditions to states at a later time, becomes so complicated that an observer can neither measure the system accurately enough nor compute with sufficient power to predict the future behavior when given an initial condition.</description></item><item><title>Complexity</title><link>https://hugocisneros.com/notes/complexity/</link><pubDate>Thu, 02 Jul 2020 08:39:00 +0200</pubDate><guid>https://hugocisneros.com/notes/complexity/</guid><description>resources Page of Pablo Funes&amp;rsquo; PhD thesis What is complexity? What is complexity?: The question is very much too vast to be answered in something smaller than a whole book. I am planning on dedicating an entire post about measuring complexity with a range of metrics that people have come up with in the past. A big question I&amp;rsquo;m asking myself is: &amp;ldquo;How much does complexity depend on subjectivity and the observer?</description></item><item><title>Make</title><link>https://hugocisneros.com/notes/make/</link><pubDate>Wed, 01 Jul 2020 20:23:00 +0200</pubDate><guid>https://hugocisneros.com/notes/make/</guid><description>tags Coding Make is a build automation tool.
Don&amp;rsquo;t deal with tabs I have been annoyed with tabs in Makefiles many times. Some editors or copy-pasting functions automatically convert tabs to space and vice-versa and this can break your Makefile.
With GNU Make 4.0 or later, it is possible to set the prefix to some other fixed token. To use &amp;gt; as a prefix, put this at the beginning of your Makefile:</description></item><item><title>Cellular automata as regular languages</title><link>https://hugocisneros.com/notes/cellular_automata_as_regular_languages/</link><pubDate>Wed, 01 Jul 2020 14:20:00 +0200</pubDate><guid>https://hugocisneros.com/notes/cellular_automata_as_regular_languages/</guid><description>tags Cellular automata, Finite state machines From (Hanson, Crutchfield 1997):
Finite state machines are appropriate for investigating pattern dynamics of CAs for a number of reasons, among which we may note the following:
FAs encompass the full range of behavior types from periodic to complex to random; Characterization of patterns using FAs makes possible a definition of pattern complexity which is both natural and computable in practice; Ensemble evolution in the space of regular languages is closed under the CA rule; The CA update rule is itself an FST; Automated inference techniques exist for reconstructing FAs from experimental data Bibliography James E.</description></item><item><title>Chemical reaction network</title><link>https://hugocisneros.com/notes/chemical_reaction_network/</link><pubDate>Wed, 01 Jul 2020 08:44:00 +0200</pubDate><guid>https://hugocisneros.com/notes/chemical_reaction_network/</guid><description> tags Complex Systems</description></item><item><title>Kernel Methods</title><link>https://hugocisneros.com/notes/kernel_methods/</link><pubDate>Wed, 01 Jul 2020 08:14:00 +0200</pubDate><guid>https://hugocisneros.com/notes/kernel_methods/</guid><description> tags Machine learning</description></item><item><title>Philosophy</title><link>https://hugocisneros.com/notes/philosophy/</link><pubDate>Fri, 26 Jun 2020 11:28:00 +0200</pubDate><guid>https://hugocisneros.com/notes/philosophy/</guid><description/></item><item><title>Notes Graph</title><link>https://hugocisneros.com/notes/note-graph/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://hugocisneros.com/notes/note-graph/</guid><description/></item></channel></rss>