You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Determine the encoding and language of any text file!
10
10
11
-
* Detects 40 languages as well as the appropriate encoding
12
-
* Works best with large inputs
13
-
* Completely free, no API key required
11
+
Determine the encoding and language of text files!
12
+
13
+
- Detects 40 languages as well as the appropriate encoding
14
+
- Available as CLI, in Node.js and in the browser
15
+
- Supports .txt, .srt, and .sub
16
+
- Works best with large inputs
17
+
- Completely free, no API key required
14
18
15
-
For reliable encoding and language detection, use files containing 500 words or more. Smaller inputs can work as well but the results might be less accurate and in some cases incorrect.
19
+
For reliable encoding and language detection, use files containing 500 words or more. Smaller inputs can work as well but the results might be less accurate and in some cases incorrect.
16
20
17
21
Feel free to test the functionality of this NPM package [here](https://encoding-and-language-detector.netlify.app/). Upload your own files and see if the encoding and language are detected correctly!
There are several ways in which you can use this NPM package. You can use it as a [command-line interface](#in-the-terminal-cli), server-side [with Node.js](#in-nodejs) or client-side [in the browser](#in-the-browser).
46
52
47
53
### In the browser
54
+
48
55
In the body section of your html file, create an input element of type `file` and give it an id.
49
56
50
57
```js
@@ -58,9 +65,11 @@ In the body section of your html file, create an input element of type `file` an
58
65
Next, load the module either by [using the script tag](#using-the-script-tag) or by [using a bundler](#using-a-bundler)!
59
66
60
67
#### Using the script tag
68
+
61
69
When loading it via the `<script>` tag, you can either use the CDN version or download the code itself and include it in your project. For a quickstart use the [CDN version](#via-cdn). If you want to be able to use it offline, [download and include it](#via-download)!
62
70
63
71
##### Via CDN
72
+
64
73
```js
65
74
// index.html
66
75
@@ -71,10 +80,11 @@ When loading it via the `<script>` tag, you can either use the CDN version or do
71
80
</body>
72
81
```
73
82
74
-
Now that you've loaded the module, you can [start using it](#usage-1).
83
+
Now that you've loaded the module, you can [start using it](#usage-1).
75
84
76
85
##### Via download
77
-
1. Create a new folder called `lib` inside your root directory
86
+
87
+
1. Create a new folder called `lib` inside your root directory
78
88
2. Inside `lib` create a new file and call it `language-encoding.min.js`
79
89
3. Make sure the encoding of your newly created file is either `UTF-8` or `UTF-8 with BOM` before proceeding!
80
90
4. Go to https://unpkg.com/detect-file-encoding-and-language/umd/language-encoding.min.js and copy the code
@@ -92,72 +102,84 @@ Now that you've loaded the module, you can [start using it](#usage-1).
92
102
```
93
103
94
104
##### Usage
105
+
95
106
The `<script>` tag exposes the `languageEncoding` function to everything in the DOM located beneath it. When you call it and pass in the file that you want to analyze, it'll return a Promise that you can use to retrieve the encoding, language and confidenc score as shown in the example below.
// Possible result: { language: english, encoding: UTF-8, confidence: 0.97}
130
147
}
131
148
```
132
149
133
150
> Note: This works great with frameworks such as React because they are doing the bundling for you. However, if you're using pure vanilla Javascript you will have to bundle it yourself!
The confidence score ranges from 0 to 1. It is based on the amount of matches that were found for a particular language and the frequency of those matches. If you want to learn more about how it all works, check out the [Wiki entry](https://github.com/gignupg/Detect-File-Encoding-and-Language/wiki)!
235
260
236
261
## Known Issues
237
-
* Unable to detect Shift-JIS encoded Japanese text files when using Node.js. Solutions are welcome!
238
-
* Unable to detect UTF-16-LE encoded files when using Node.js. Solutions are welcome!
262
+
263
+
- Unable to detect Shift-JIS encoded Japanese text files when using Node.js. Solutions are welcome!
264
+
- Unable to detect UTF-16-LE encoded files when using Node.js. Solutions are welcome!
0 commit comments