Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,9 @@ Project/CWSNER/model/

# NCTU_DL_HW1
# TaibetanMNIST
Project/NCTU_DL_HW1/TibetanMNIST.npz
Project/NCTU_DL/HW1/TibetanMNIST-DNN/TibetanMNIST.npz
# NCTU_DL_HW2
Project/NCTU_DL/HW2/*.zip

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
1 change: 1 addition & 0 deletions .lfsconfig
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
[lfs]
url = https://gitlab.com/daviddwlee84/DeepLearningPractice.git/info/lfs

6 changes: 6 additions & 0 deletions Notes/Application/NLP/EnglishPreprocessing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@

Tokenizer

Stop Words

[All you need to know about NLP Text Preprocessing](https://gdcoder.com/all-you-need-to-know-about-nlp-text-preprocessing/)
71 changes: 71 additions & 0 deletions Notes/Concept/DataSmoothing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Data Smoothing

## Background

### [N-gram](N-GramModel.md)

### [Data Sparsity](N-GramModel.md#Data-Sparseness)

## Overview of Smoothing Technique

Simple Smoothing

* Addictive smoothing
* Add-one smoothing
* Held-out Estimation (留存估計)
* Deleted Estimation / Two-way Cross Validation (刪除估計)
* Good Turing smoothing
* ... etc.

Combination Smoothing

* Interpolation smoothing (插值)
* Jelinek-Mercer smoothing
* Katz smoothing (backoff) (退回模型)
* Kneser-Ney smoothing

## Simple Smoothing

> All the n-gram which didn't appear will have the same probability distribution.

### Add-one Smoothing

> Add one to frequency of each n-gram

#### Addictive Smoothing

> Add $\delta$ instead of one to frequency of each n-gram.
> (Typically, $0<\delta\leq1$)

### Held-out Estimation

> If the corpus is large, it's a good method.
> Since it need to split data into two set.

#### Deleted Estimation / Two-way Cross Valiation

> If the corpus is small

### Good Turing Smoothing

$$
p_{GT} (\text{an n-gram occcuring r times}) = \frac{(r+1)N_{r+1}}{N\cdot N_r}
$$

* [Wiki - Good–Turing frequency estimation](https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation)

## Combination Smoothing

### Interpolation Smoothing

#### Jelinek-Mercer Smoothing

### Katz Smoothing (Backoff Model)

### Kneser-Ney Smoothing

## Links

* [**Slides - Standford NLP Lunch Tutorial: Smoothing**](https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf)
* [Wiki - Smoothing](https://en.wikipedia.org/wiki/Smoothing)
* [NLP 筆記 - 平滑方法(Smoothing)小結](http://www.shuang0420.com/2017/03/24/NLP%20%E7%AC%94%E8%AE%B0%20-%20%E5%B9%B3%E6%BB%91%E6%96%B9%E6%B3%95(Smoothing)%E5%B0%8F%E7%BB%93/)
49 changes: 49 additions & 0 deletions Notes/Concept/Dialogue.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Dialogue

## Overview

### Catetory

* **Task-oriented** dialogue: to get something done during conversation
* Assistive
* customer service
* giving recommendations
* question answering
* Co-operative
* two agents solve a task together through dialogue
* Adversarial
* two agents compete in a task through dialogue
* **Social** dialogue: no explicit task
* Chit-chat
* for fun or company
* Therapy / mental wellbeing

### Approach

* pre-neural dialogue system
* pre-defined templates
* retrieve an appropriate response from a corpus of responses
* open-ended freeform dialogue system

## Problems / Solution

A naive application of standard seq2seq+attention methods thas serious pervasive deficiencies for (chitchat) dialogue

* Genericness / boring responses
* Irrelevant responses (not sufficiently related to context)
* Repetition
* Lack of context (not remembering conversation history)
* Lack of consistent persona

### Irrelevant response problem

* [[1510.03055] A Diversity-Promoting Objective Function for Neural Conversation Models](https://arxiv.org/abs/1510.03055)

### Genericness / boring response problem

... cs224n lecture 15 slides

## Resources

* [[1506.05869] A Neural Conversational Model](https://arxiv.org/abs/1506.05869)
* [Neural Responding Machine for Short-Text Conversation - ACL Anthology](https://www.aclweb.org/anthology/P15-1152/)
20 changes: 20 additions & 0 deletions Notes/Concept/Embedding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Embedding

自然語言高級專題 Lect8

> * vs. Feature Engineering in Statistics Machine Learning
> * vs. Dictionary-based Word Representative






同義詞詞林
CCD
[HowNet](http://www.keenage.com/)




[Embedding/Chinese-Word-Vectors: 100+ Chinese Word Vectors 上百种预训练中文词向量](https://github.com/Embedding/Chinese-Word-Vectors)
8 changes: 8 additions & 0 deletions Notes/Concept/GenerativeMethod.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@






[CycleGAN Project Page](https://junyanz.github.io/CycleGAN/)
[junyanz/CycleGAN: Software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more.](https://github.com/junyanz/CycleGAN)
9 changes: 9 additions & 0 deletions Notes/Concept/ImageCaptioning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Image Captioning

cs224n lecture 15...

## Dataset

COCO

[[1805.04833] Hierarchical Neural Story Generation](https://arxiv.org/abs/1805.04833)
5 changes: 5 additions & 0 deletions Notes/Concept/KnowledgeEmbedding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Knowledge Embedding

## Resources

* [OpenKE](https://github.com/thunlp/OpenKE)
21 changes: 21 additions & 0 deletions Notes/Concept/MachineTranslation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Machine Translation

## Overview

> * Rule-based Approach
> * Corpus-based Approach

### History

* 1950s: Early Machine Translation
* mostly *ruled-based* - using a bilingual dictionary to map words to their counterparts
* 1990s-2010: Statistical Machine Translation
* learn a *probabilistic model* from data
* $\argmax_y P(y|x) = \argmax_y\underbrace{P(x|y)}_{\text{Translation Model}}\underbrace{P(y)}_{\text{Language Model}}$
* learning alignment: correspondence between particular words in the translated sentence pair
* 2014 after: Neural Machine Translation
* [sequence-to-sequence](../Mechanism/seq-to-seq.md)

## Resources

* [Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 8 – Translation, Seq2Seq, Attention - YouTube](https://www.youtube.com/watch?v=XXtpJxZBa2c&feature=youtu.be)
25 changes: 25 additions & 0 deletions Notes/Concept/ModelCompression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Model Compression / Knowledge Distillation

## Resources

### Paper

Classic

* [Model compression](https://dl.acm.org/citation.cfm?id=1150464)
* [[1503.02531] Distilling the Knowledge in a Neural Network](https://arxiv.org/abs/1503.02531)

Survey

* [FLHonker/Awesome-Knowledge-Distillation: Awesome Knowledge-Distillation.](https://github.com/FLHonker/Awesome-Knowledge-Distillation)
* [[1710.09282] A Survey of Model Compression and Acceleration for Deep Neural Networks](https://arxiv.org/abs/1710.09282)

### Tools

* [NervanaSystems/distiller: Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research.](https://github.com/NervanaSystems/distiller)
* [Distiller Documentation](https://nervanasystems.github.io/distiller)
* [GMvandeVen/continual-learning: PyTorch implementation of various methods for continual learning (XdG, EWC, online EWC, SI, LwF, DGR, DGR+distill, RtF, iCaRL).](https://github.com/GMvandeVen/continual-learning)

Pytorch

* [peterliht/knowledge-distillation-pytorch: A PyTorch implementation for exploring deep and shallow knowledge distillation (KD) experiments with flexibility](https://github.com/peterliht/knowledge-distillation-pytorch)
3 changes: 3 additions & 0 deletions Notes/Concept/Storytelling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Storytelling

[Storytelling Workshop 2019](http://www.visionandlanguage.net/workshop2019/)
17 changes: 17 additions & 0 deletions Notes/Concept/SubwordsModel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@



sentencepiece

Byte Pair Encoding (BPE)

[google/sentencepiece: Unsupervised text tokenizer for Neural Network-based text generation.](https://github.com/google/sentencepiece)


fastText
Aim ... (cs224n lecture12 slides)
An extension of the word2ve skip-gram model with character n-grams



[Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 12 – Subword Models - YouTube](https://www.youtube.com/watch?v=9oTHFx0Gg3Q&feature=youtu.be)
Loading