daviddwlee84 · daviddwlee84 · May 26, 2022
diff --git a/.gitignore b/.gitignore
@@ -67,7 +67,9 @@ Project/CWSNER/model/
 
 # NCTU_DL_HW1
 # TaibetanMNIST
-Project/NCTU_DL_HW1/TibetanMNIST.npz
+Project/NCTU_DL/HW1/TibetanMNIST-DNN/TibetanMNIST.npz
+# NCTU_DL_HW2
+Project/NCTU_DL/HW2/*.zip
 
 # Byte-compiled / optimized / DLL files
 __pycache__/

diff --git a/.lfsconfig b/.lfsconfig
@@ -1,2 +1,3 @@
 [lfs]
 	url = https://gitlab.com/daviddwlee84/DeepLearningPractice.git/info/lfs
+
diff --git a/Notes/Application/NLP/EnglishPreprocessing.md b/Notes/Application/NLP/EnglishPreprocessing.md
@@ -0,0 +1,6 @@
+
+Tokenizer
+
+Stop Words
+
+[All you need to know about NLP Text Preprocessing](https://gdcoder.com/all-you-need-to-know-about-nlp-text-preprocessing/)
diff --git a/Notes/Concept/DataSmoothing.md b/Notes/Concept/DataSmoothing.md
@@ -0,0 +1,71 @@
+# Data Smoothing
+
+## Background
+
+### [N-gram](N-GramModel.md)
+
+### [Data Sparsity](N-GramModel.md#Data-Sparseness)
+
+## Overview of Smoothing Technique
+
+Simple Smoothing
+
+* Addictive smoothing
+  * Add-one smoothing
+* Held-out Estimation (留存估計)
+  * Deleted Estimation / Two-way Cross Validation (刪除估計)
+* Good Turing smoothing
+* ... etc.
+
+Combination Smoothing
+
+* Interpolation smoothing (插值)
+  * Jelinek-Mercer smoothing
+* Katz smoothing (backoff) (退回模型)
+* Kneser-Ney smoothing
+
+## Simple Smoothing
+
+> All the n-gram which didn't appear will have the same probability distribution.
+
+### Add-one Smoothing
+
+> Add one to frequency of each n-gram
+
+#### Addictive Smoothing
+
+> Add $\delta$ instead of one to frequency of each n-gram.
+> (Typically, $0<\delta\leq1$)
+
+### Held-out Estimation
+
+> If the corpus is large, it's a good method.
+> Since it need to split data into two set.
+
+#### Deleted Estimation / Two-way Cross Valiation
+
+> If the corpus is small
+
+### Good Turing Smoothing
+
+$$
+p_{GT} (\text{an n-gram occcuring r times}) = \frac{(r+1)N_{r+1}}{N\cdot N_r}
+$$
+
+* [Wiki - Good–Turing frequency estimation](https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation)
+
+## Combination Smoothing
+
+### Interpolation Smoothing
+
+#### Jelinek-Mercer Smoothing
+
+### Katz Smoothing (Backoff Model)
+
+### Kneser-Ney Smoothing
+
+## Links
+
+* [**Slides - Standford NLP Lunch Tutorial: Smoothing**](https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf)
+* [Wiki - Smoothing](https://en.wikipedia.org/wiki/Smoothing)
+* [NLP 筆記 - 平滑方法(Smoothing)小結](http://www.shuang0420.com/2017/03/24/NLP%20%E7%AC%94%E8%AE%B0%20-%20%E5%B9%B3%E6%BB%91%E6%96%B9%E6%B3%95(Smoothing)%E5%B0%8F%E7%BB%93/)
diff --git a/Notes/Concept/Dialogue.md b/Notes/Concept/Dialogue.md
@@ -0,0 +1,49 @@
+# Dialogue
+
+## Overview
+
+### Catetory
+
+* **Task-oriented** dialogue: to get something done during conversation
+  * Assistive
+    * customer service
+    * giving recommendations
+    * question answering
+  * Co-operative
+    * two agents solve a task together through dialogue
+  * Adversarial
+    * two agents compete in a task through dialogue
+* **Social** dialogue: no explicit task
+  * Chit-chat
+    * for fun or company
+  * Therapy / mental wellbeing
+
+### Approach
+
+* pre-neural dialogue system
+  * pre-defined templates
+  * retrieve an appropriate response from a corpus of responses
+* open-ended freeform dialogue system
+
+## Problems / Solution
+
+A naive application of standard seq2seq+attention methods thas serious pervasive deficiencies for (chitchat) dialogue
+
+* Genericness / boring responses
+* Irrelevant responses (not sufficiently related to context)
+* Repetition
+* Lack of context (not remembering conversation history)
+* Lack of consistent persona
+
+### Irrelevant response problem
+
+* [[1510.03055] A Diversity-Promoting Objective Function for Neural Conversation Models](https://arxiv.org/abs/1510.03055)
+
+### Genericness / boring response problem
+
+... cs224n lecture 15 slides
+
+## Resources
+
+* [[1506.05869] A Neural Conversational Model](https://arxiv.org/abs/1506.05869)
+* [Neural Responding Machine for Short-Text Conversation - ACL Anthology](https://www.aclweb.org/anthology/P15-1152/)
diff --git a/Notes/Concept/Embedding.md b/Notes/Concept/Embedding.md
@@ -0,0 +1,20 @@
+# Embedding
+
+自然語言高級專題 Lect8
+
+> * vs. Feature Engineering in Statistics Machine Learning
+> * vs. Dictionary-based Word Representative
+
+
+
+
+
+
+同義詞詞林
+CCD
+[HowNet](http://www.keenage.com/)
+
+
+
+
+[Embedding/Chinese-Word-Vectors: 100+ Chinese Word Vectors 上百种预训练中文词向量](https://github.com/Embedding/Chinese-Word-Vectors)
diff --git a/Notes/Concept/GenerativeMethod.md b/Notes/Concept/GenerativeMethod.md
@@ -0,0 +1,8 @@
+
+
+
+
+
+
+[CycleGAN Project Page](https://junyanz.github.io/CycleGAN/)
+[junyanz/CycleGAN: Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.](https://github.com/junyanz/CycleGAN)
diff --git a/Notes/Concept/ImageCaptioning.md b/Notes/Concept/ImageCaptioning.md
@@ -0,0 +1,9 @@
+# Image Captioning
+
+cs224n lecture 15...
+
+## Dataset
+
+COCO
+
+[[1805.04833] Hierarchical Neural Story Generation](https://arxiv.org/abs/1805.04833)
diff --git a/Notes/Concept/KnowledgeEmbedding.md b/Notes/Concept/KnowledgeEmbedding.md
@@ -0,0 +1,5 @@
+# Knowledge Embedding
+
+## Resources
+
+* [OpenKE](https://github.com/thunlp/OpenKE)
diff --git a/Notes/Concept/MachineTranslation.md b/Notes/Concept/MachineTranslation.md
@@ -0,0 +1,21 @@
+# Machine Translation
+
+## Overview
+
+> * Rule-based Approach
+> * Corpus-based Approach
+
+### History
+
+* 1950s: Early Machine Translation
+  * mostly *ruled-based* - using a bilingual dictionary to map words to their counterparts
+* 1990s-2010: Statistical Machine Translation
+  * learn a *probabilistic model* from data
+  * $\argmax_y P(y|x) = \argmax_y\underbrace{P(x|y)}_{\text{Translation Model}}\underbrace{P(y)}_{\text{Language Model}}$
+  * learning alignment: correspondence between particular words in the translated sentence pair
+* 2014 after: Neural Machine Translation
+  * [sequence-to-sequence](../Mechanism/seq-to-seq.md)
+
+## Resources
+
+* [Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 8 – Translation, Seq2Seq, Attention - YouTube](https://www.youtube.com/watch?v=XXtpJxZBa2c&feature=youtu.be)
diff --git a/Notes/Concept/ModelCompression.md b/Notes/Concept/ModelCompression.md
@@ -0,0 +1,25 @@
+# Model Compression / Knowledge Distillation
+
+## Resources
+
+### Paper
+
+Classic
+
+* [Model compression](https://dl.acm.org/citation.cfm?id=1150464)
+* [[1503.02531] Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)
+
+Survey
+
+* [FLHonker/Awesome-Knowledge-Distillation: Awesome Knowledge-Distillation.](https://github.com/FLHonker/Awesome-Knowledge-Distillation)
+* [[1710.09282] A Survey of Model Compression and Acceleration for Deep Neural Networks](https://arxiv.org/abs/1710.09282)
+
+### Tools
+
+* [NervanaSystems/distiller: Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research.](https://github.com/NervanaSystems/distiller)
+  * [Distiller Documentation](https://nervanasystems.github.io/distiller)
+* [GMvandeVen/continual-learning: PyTorch implementation of various methods for continual learning (XdG, EWC, online EWC, SI, LwF, DGR, DGR+distill, RtF, iCaRL).](https://github.com/GMvandeVen/continual-learning)
+
+Pytorch
+
+* [peterliht/knowledge-distillation-pytorch: A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility](https://github.com/peterliht/knowledge-distillation-pytorch)
diff --git a/Notes/Concept/Storytelling.md b/Notes/Concept/Storytelling.md
@@ -0,0 +1,3 @@
+# Storytelling
+
+[Storytelling Workshop 2019](http://www.visionandlanguage.net/workshop2019/)
diff --git a/Notes/Concept/SubwordsModel.md b/Notes/Concept/SubwordsModel.md
@@ -0,0 +1,17 @@
+
+
+
+sentencepiece
+
+Byte Pair Encoding (BPE)
+
+[google/sentencepiece: Unsupervised text tokenizer for Neural Network-based text generation.](https://github.com/google/sentencepiece)
+
+
+fastText
+Aim ... (cs224n lecture12 slides)
+An extension of the word2ve skip-gram model with character n-grams
+
+
+
+[Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 12 – Subword Models - YouTube](https://www.youtube.com/watch?v=9oTHFx0Gg3Q&feature=youtu.be)
Original file line number	Diff line number	Diff line change
		@@ -1,2 +1,3 @@
		[lfs]
		url = https://gitlab.com/daviddwlee84/DeepLearningPractice.git/info/lfs
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,8 @@






		[CycleGAN Project Page](https://junyanz.github.io/CycleGAN/)
		[junyanz/CycleGAN: Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.](https://github.com/junyanz/CycleGAN)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		# Storytelling

		[Storytelling Workshop 2019](http://www.visionandlanguage.net/workshop2019/)