Reference

Ever learn something and need to be able to find it later? Post it here for quick searchable access in the future.

Posted At: 9/17/2025 8:07:22 AM
Posted By: Comfortably Anonymous
Viewed: 67 times
0 Dislikes: 0
How a Large Language Model (LLM) Works(A duck‑tastic tour for a bachelor‑level reader)1. The “Big Brain” in a nutshellImagine a giant, super‑smart brain that sits in a server room. That brain is made of layers of mathematical “cells” called neural networks. An LLM (Large Language Model) is just a particular type of neural network that has learned to generate, translate, answer questions, and more—all by reading a huge amount of text from the internet, books, articles, and so on.Think of it as a student who has read almost every book and article that ever existed, and now can write essays, answer trivia, or even help you solve a math problem. The student learned patterns in the text, not facts about the world. That’s why it can produce creative, plausible sentences but sometimes hallucinate.2. The building blocksBuilding BlockWhat it doesSimple analogyTokenizationBreaks text into “words” or sub‑words (tokens).Splitting a sentence into LEGO bricks.EmbeddingsTurns tokens into numeric vectors (lists of numbers).Turning each LEGO brick into a colored block that tells the model how “similar” it is to others.Transformer layersCompute relationships between tokens using attention.A group of students in a classroom pointing to each other’s notes to decide who needs help.Attention mechanismGathers context from all positions in the sentence.Looking at all classmates’ notes before answering a question.Feed‑forward networkA small neural net that refines the attention output.The teacher’s feedback that makes the answer clearer.Softmax outputPicks the next token based on probabilities.Voting on which word should come next.3. From raw text to a “smart” brain3.1 Tokenization ...
Posted At: 9/17/2025 8:04:21 AM
Posted By: Comfortably Anonymous
Viewed: 65 times
0 Dislikes: 0
OverviewLarge‑Language Models (LLMs) are transformer‑based neural nets trained on terabytes of text to learn a contextual distribution over tokens.The training pipeline is a blend of data engineering, deep‑learning research, and large‑scale distributed systems.Below is a “full stack” walk‑through aimed at someone who’s built distributed software and written low‑level code.1. Data PipelineStageWhat you doWhy it mattersCorpus collectionWeb‑scraping, public corpora (Common Crawl, Wikipedia, books, code), proprietary data.Determines the model’s inductive biases and knowledge.Deduplication / filteringRemove exact or near‑duplicates, profanity filters, harmful content flags.Prevents data leakage, reduces redundancy, keeps the dataset clean.ShardingSplit the corpus into shards (~1 GB each).Enables parallel ingestion, deterministic random access.TokenizationByte‑Pair Encoding (BPE), SentencePiece, or WordPiece.Turns raw text into integer IDs; the choice influences vocabulary size and OOV handling.Pre‑tokenization transformsLowercasing, whitespace normalisation, special token insertion (, ).Standardises the input and helps the model learn boundaries.Dataset constructionConvert shards to a dataset that supports streaming: TextDataset, IterableDataset.Allows reading data without keeping it all in memory.Sequence packingFor transformer training, break each shard into fixed‑length token sequences (e.g. 2048 tokens).Enables efficient batching and padding‑free computation.Bucketing / collateGroup similar‑length sequences together.Minimises padding, improves GPU utilisation.Implementation hint: Use torch.utils.data.IterableDataset with a generator that reads the shards sequentially, then yields mini‑batches to a collate function that pads to the max length in the batch.2. Model ArchitectureSub‑modulePurposeTypical design choicesEmbedding layerTurns token IDs into dense vectors.nn.Embedding(vocab_size, hidden_dim); optionally tied to the output projection.Positional encodingAdds order information.Learned positional embeddings or sinusoidal embeddings.Transformer blocksCore transformer: self‑attention + MLP + residuals.12–96 layers, 768–12288 hidden dim, 12–128 heads.LayerNormStabilises training.Pre‑ or post‑LayerNorm depending on the ...
Posted At: 11/7/2022 11:08:40 AM
Posted By: Comfortably Anonymous
Viewed: 3836 times
0 Dislikes: 0
As of VSCode 1.69, it has an excellent three-pane merge tool on par with the GitKraken and Beyond Compare merge tools. Here's how to configure Hg to use it (In Linux anyway, for other OS's you will have to keep searching...)I'm pretty new to Mercurial, being totally used to Git and GitKraken, so I've been a little lost with handling merges as most Linux-based merge tools make no sense to me as far as understanding how to work with their UI. (kdiff, meld, vimdiff all are off on another planet as far as what I'm used to.)So happily I found this article (Using VS Code for merges in Mercurial) explaining just how to do that.It has additional info that you will find useful, but I wanted to capture the basics just in case the article goes away.Base changes needed in your ~/.hgrc file:[extensions] extdiff = [ui] merge = code [merge-tools] code.priority = 100 code.premerge = True code.args = --wait --merge $other $local $base $output [extdiff] cmd.vsd = code opts.vsd = --wait --diff
Posted At: 10/26/2022 3:19:02 PM
Posted By: Comfortably Anonymous
Viewed: 1870 times
0 Dislikes: 0
A little embarrassed that I did not know this, even though I have used PDFs for decades, and have always been frustrated with them because every time you re-open a PDF, you end up right at the first page of the document and have to try and remember where you were the last time. Really a pain when reading a long document or book that will take more than one sitting to get all the way thru.I've always looked for a way to "bookmark" the current page, just like you would in a book. But Adobe had other uses for the word "bookmark" when they created Adobe Reader. A "bookmark" to them is to create one of many shortcuts in their "bookmarks" list. Even worse, you cannot create one in the free version of Acrobat Reader. So that sucks.So how do you do this with the free version of Adobe Acrobat Reader?Well...It's embarrassingly simple, but this is all you have to do: Click on Edit, then Preferences, then Documents, then put a checkmark in for "Restore last view settings when reopening documents". Click OK.That's IT!!Scroll down to some page, close Reader, re-open, and VOILA - you are right at the page you were at when you closed the document.(Damn Adobe, that's some terrible User Experience there. That should be front-and-center. I can't think of anything more important in a book metaphor than how to mark your last page read. Normally, you use a bookmark with a physical book. but with a ...
Posted At: 12/23/2021 1:34:17 PM
Posted By: Comfortably Anonymous
Viewed: 2511 times
0 Dislikes: 0
For lowerCamelCase :SEARCH = (?-is)([a-zA-Z])([a-zA-Z]+)|(\b|_)[a-zA-Z](\b|_)|[^a-zA-Z\r\n]+ and REPLACE = \u\1\L\2 then : SEARCH = (?-is)([A-Z][a-z]+){2,} REPLACE = \l$0BEFORE Hey everyone - We all need to get together and see if we can come up with a solution to make sure we can't experience this so-called "problem" again. (THIS MEANS YOU 123BOB!!)AFTER heyEveryoneWeAllNeedToGetTogetherAndSeeIfWeCanComeUpWithASolutionToMakeSureWeCantExperienceThisSoCalledProblemAgainThisMeansYou123Bob