Skip to content

Commit af4b3f6

Browse files
committed
Added list of supported formats to the readme
1 parent f8e3afb commit af4b3f6

File tree

3 files changed

+121
-94
lines changed

3 files changed

+121
-94
lines changed

README.md

Lines changed: 117 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,57 @@
11
# Detect-File-Encoding-and-Language
2+
23
![npm](https://img.shields.io/npm/dw/detect-file-encoding-and-language)
34
![npm](https://img.shields.io/npm/v/detect-file-encoding-and-language)
45
![npm bundle size](https://img.shields.io/bundlephobia/min/detect-file-encoding-and-language)
56

67
[![NPM stats](https://nodei.co/npm/detect-file-encoding-and-language.svg?downloadRank=true&downloads=true)](https://www.npmjs.org/package/detect-file-encoding-and-language)
78

89
## Functionality
9-
Determine the encoding and language of any text file!
1010

11-
* Detects 40 languages as well as the appropriate encoding
12-
* Works best with large inputs
13-
* Completely free, no API key required
11+
Determine the encoding and language of text files!
12+
13+
- Detects 40 languages as well as the appropriate encoding
14+
- Available as CLI, in Node.js and in the browser
15+
- Supports .txt, .srt, and .sub
16+
- Works best with large inputs
17+
- Completely free, no API key required
1418

15-
For reliable encoding and language detection, use files containing 500 words or more. Smaller inputs can work as well but the results might be less accurate and in some cases incorrect.
19+
For reliable encoding and language detection, use files containing 500 words or more. Smaller inputs can work as well but the results might be less accurate and in some cases incorrect.
1620

1721
Feel free to test the functionality of this NPM package [here](https://encoding-and-language-detector.netlify.app/). Upload your own files and see if the encoding and language are detected correctly!
1822

1923
## Index
24+
2025
- [Detect-File-Encoding-and-Language](#detect-file-encoding-and-language)
21-
* [Functionality](#functionality)
22-
* [Index](#index)
23-
* [Usage](#usage)
24-
+ [In the browser](#in-the-browser)
26+
- [Functionality](#functionality)
27+
- [Index](#index)
28+
- [Usage](#usage)
29+
- [In the browser](#in-the-browser)
2530
- [Using the script tag](#using-the-script-tag)
26-
* [Via CDN](#via-cdn)
27-
* [Via download](#via-download)
28-
* [Usage](#usage-1)
31+
- [Via CDN](#via-cdn)
32+
- [Via download](#via-download)
33+
- [Usage](#usage-1)
2934
- [Using a bundler](#using-a-bundler)
30-
* [Installation](#installation)
31-
* [Usage](#usage-2)
32-
+ [In Node.js](#in-nodejs)
35+
- [Installation](#installation)
36+
- [Usage](#usage-2)
37+
- [In Node.js](#in-nodejs)
3338
- [Installation](#installation-1)
3439
- [Usage](#usage-3)
35-
+ [In the terminal (CLI)](#in-the-terminal-cli)
40+
- [In the terminal (CLI)](#in-the-terminal-cli)
3641
- [Installation](#installation-2)
3742
- [Usage](#usage-4)
38-
* [Supported Languages](#supported-languages)
39-
* [Used Encodings](#used-encodings)
40-
* [Confidence Score](#confidence-score)
41-
* [Known Issues](#known-issues)
42-
* [License](#license)
43+
- [Supported Languages](#supported-languages)
44+
- [Used Encodings](#used-encodings)
45+
- [Confidence Score](#confidence-score)
46+
- [Known Issues](#known-issues)
47+
- [License](#license)
4348

4449
## Usage
50+
4551
There are several ways in which you can use this NPM package. You can use it as a [command-line interface](#in-the-terminal-cli), server-side [with Node.js](#in-nodejs) or client-side [in the browser](#in-the-browser).
4652

4753
### In the browser
54+
4855
In the body section of your html file, create an input element of type `file` and give it an id.
4956

5057
```js
@@ -58,9 +65,11 @@ In the body section of your html file, create an input element of type `file` an
5865
Next, load the module either by [using the script tag](#using-the-script-tag) or by [using a bundler](#using-a-bundler)!
5966

6067
#### Using the script tag
68+
6169
When loading it via the `<script>` tag, you can either use the CDN version or download the code itself and include it in your project. For a quickstart use the [CDN version](#via-cdn). If you want to be able to use it offline, [download and include it](#via-download)!
6270

6371
##### Via CDN
72+
6473
```js
6574
// index.html
6675

@@ -71,10 +80,11 @@ When loading it via the `<script>` tag, you can either use the CDN version or do
7180
</body>
7281
```
7382

74-
Now that you've loaded the module, you can [start using it](#usage-1).
83+
Now that you've loaded the module, you can [start using it](#usage-1).
7584

7685
##### Via download
77-
1. Create a new folder called `lib` inside your root directory
86+
87+
1. Create a new folder called `lib` inside your root directory
7888
2. Inside `lib` create a new file and call it `language-encoding.min.js`
7989
3. Make sure the encoding of your newly created file is either `UTF-8` or `UTF-8 with BOM` before proceeding!
8090
4. Go to https://unpkg.com/detect-file-encoding-and-language/umd/language-encoding.min.js and copy the code
@@ -92,72 +102,84 @@ Now that you've loaded the module, you can [start using it](#usage-1).
92102
```
93103

94104
##### Usage
105+
95106
The `<script>` tag exposes the `languageEncoding` function to everything in the DOM located beneath it. When you call it and pass in the file that you want to analyze, it'll return a Promise that you can use to retrieve the encoding, language and confidenc score as shown in the example below.
96107

97108
```js
98109
// app.js
99110

100-
document.getElementById("my-input-field").addEventListener("change", inputHandler);
111+
document
112+
.getElementById("my-input-field")
113+
.addEventListener("change", inputHandler);
101114

102115
function inputHandler(e) {
103-
const file = e.target.files[0];
116+
const file = e.target.files[0];
104117

105-
languageEncoding(file).then(fileInfo => console.log(fileInfo));
106-
// Possible result: { language: english, encoding: UTF-8, confidence: 0.97}
118+
languageEncoding(file).then((fileInfo) => console.log(fileInfo));
119+
// Possible result: { language: english, encoding: UTF-8, confidence: 0.97}
107120
}
108121
```
109122

110123
#### Using a bundler
111124

112125
##### Installation
126+
113127
```bash
114128
$ npm install detect-file-encoding-and-language
115129
```
116130

117131
##### Usage
132+
118133
```js
119134
// app.js
120135

121136
const languageEncoding = require("detect-file-encoding-and-language");
122137

123-
document.getElementById("my-input-field").addEventListener("change", inputHandler);
138+
document
139+
.getElementById("my-input-field")
140+
.addEventListener("change", inputHandler);
124141

125142
function inputHandler(e) {
126-
const file = e.target.files[0];
143+
const file = e.target.files[0];
127144

128-
languageEncoding(file).then(fileInfo => console.log(fileInfo));
129-
// Possible result: { language: english, encoding: UTF-8, confidence: 0.97}
145+
languageEncoding(file).then((fileInfo) => console.log(fileInfo));
146+
// Possible result: { language: english, encoding: UTF-8, confidence: 0.97}
130147
}
131148
```
132149

133150
> Note: This works great with frameworks such as React because they are doing the bundling for you. However, if you're using pure vanilla Javascript you will have to bundle it yourself!
134151
135152
### In Node.js
136153

137-
#### Installation
154+
#### Installation
155+
138156
```bash
139157
$ npm install detect-file-encoding-and-language
140158
```
141159

142160
#### Usage
161+
143162
```js
144163
// index.js
145164

146165
const languageEncoding = require("detect-file-encoding-and-language");
147166

148-
const pathToFile = "/home/username/documents/my-text-file.txt"
167+
const pathToFile = "/home/username/documents/my-text-file.txt";
149168

150-
languageEncoding(pathToFile).then(fileInfo => console.log(fileInfo));
169+
languageEncoding(pathToFile).then((fileInfo) => console.log(fileInfo));
151170
// Possible result: { language: japanese, encoding: Shift-JIS, confidence: 1 }
152171
```
153172

154173
### In the terminal (CLI)
155174

156-
#### Installation
175+
#### Installation
176+
157177
```bash
158178
$ npm install -g detect-file-encoding-and-language
159179
```
180+
160181
#### Usage
182+
161183
Once installed you'll be able to use the command `dfeal` to retrieve the encoding and language of your text files.
162184

163185
```bash
@@ -173,69 +195,73 @@ $ dfeal /home/user\ name/Documents/subtitle\ file.srt
173195
```
174196

175197
## Supported Languages
176-
* Polish
177-
* Czech
178-
* Hungarian
179-
* Romanian
180-
* Slovak
181-
* Slovenian
182-
* Albanian
183-
* Russian
184-
* Ukrainian
185-
* Bulgarian
186-
* English
187-
* French
188-
* Portuguese
189-
* Spanish
190-
* German
191-
* Italian
192-
* Danish
193-
* Norwegian
194-
* Swedish
195-
* Dutch
196-
* Finnish
197-
* Serbo-Croatian
198-
* Estonian
199-
* Icelandic
200-
* Malay-Indonesian
201-
* Greek
202-
* Turkish
203-
* Hebrew
204-
* Arabic
205-
* Farsi-Persian
206-
* Lithuanian
207-
* Chinese-Simplified
208-
* Chinese-Traditional
209-
* Japanese
210-
* Korean
211-
* Thai
212-
* Bengali
213-
* Hindi
214-
* Urdu
215-
* Vietnamese
198+
199+
- Polish
200+
- Czech
201+
- Hungarian
202+
- Romanian
203+
- Slovak
204+
- Slovenian
205+
- Albanian
206+
- Russian
207+
- Ukrainian
208+
- Bulgarian
209+
- English
210+
- French
211+
- Portuguese
212+
- Spanish
213+
- German
214+
- Italian
215+
- Danish
216+
- Norwegian
217+
- Swedish
218+
- Dutch
219+
- Finnish
220+
- Serbo-Croatian
221+
- Estonian
222+
- Icelandic
223+
- Malay-Indonesian
224+
- Greek
225+
- Turkish
226+
- Hebrew
227+
- Arabic
228+
- Farsi-Persian
229+
- Lithuanian
230+
- Chinese-Simplified
231+
- Chinese-Traditional
232+
- Japanese
233+
- Korean
234+
- Thai
235+
- Bengali
236+
- Hindi
237+
- Urdu
238+
- Vietnamese
216239

217240
## Used Encodings
218-
* UTF-8
219-
* CP1250
220-
* CP1251
221-
* CP1252
222-
* CP1253
223-
* CP1254
224-
* CP1255
225-
* CP1256
226-
* CP1257
227-
* GB18030
228-
* BIG5
229-
* Shift-JIS
230-
* EUC-KR
231-
* TIS-620
241+
242+
- UTF-8
243+
- CP1250
244+
- CP1251
245+
- CP1252
246+
- CP1253
247+
- CP1254
248+
- CP1255
249+
- CP1256
250+
- CP1257
251+
- GB18030
252+
- BIG5
253+
- Shift-JIS
254+
- EUC-KR
255+
- TIS-620
232256

233257
## Confidence Score
258+
234259
The confidence score ranges from 0 to 1. It is based on the amount of matches that were found for a particular language and the frequency of those matches. If you want to learn more about how it all works, check out the [Wiki entry](https://github.com/gignupg/Detect-File-Encoding-and-Language/wiki)!
235260

236261
## Known Issues
237-
* Unable to detect Shift-JIS encoded Japanese text files when using Node.js. Solutions are welcome!
238-
* Unable to detect UTF-16-LE encoded files when using Node.js. Solutions are welcome!
262+
263+
- Unable to detect Shift-JIS encoded Japanese text files when using Node.js. Solutions are welcome!
264+
- Unable to detect UTF-16-LE encoded files when using Node.js. Solutions are welcome!
239265

240266
## License
241267

package-lock.json

Lines changed: 3 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "detect-file-encoding-and-language",
3-
"version": "1.7.3",
3+
"version": "1.7.4",
44
"description": "Charset Detector - Detect the encoding and language of any file - Use it in the browser, with Node.js, or via CLI",
55
"main": "src/index-node.js",
66
"scripts": {

0 commit comments

Comments
 (0)