new feature: options.truncate length or cb fn#60
new feature: options.truncate length or cb fn#60nmalenovic wants to merge 4 commits intoparshap:masterfrom
Conversation
|
Hi @nmalenovic, thanks for the contribution! 🎉 I think the new option is quite useful as a number, but I'm uncertain of the need to have it as a function. What's your use-case for that? |
Hi @joelmukuthu, as indicated in my PR submission comments - I'd want truncation NOT to exclude file extension (e.g. "lengthyfilename.docx" doesn't become "shortened" missing file extension) or do fancy transformation (change "Very Lengthy Filename" to "VryLngthyFlnm") - often useful when reading first paragrah of a document or an abstract and trying to come up with a file name. |
|
I would like to see the same functionality, where file extensions are preserved. If desired, this could be handled internally with a flag and number instead of a callback function. The module could isolate the extension like this function getFileExtension( filename ) {
return (
( /[.]/.exec( filename ))
? /[^.]+$/.exec( filename )[0]
: ''
)
}and then truncate by the desired amount plus the length of the extension and one more for the dot, remove/replace trailing dots, and finally prepend the extension back to the result. This would solve your suggested scenario. Any other reason a callback might be necessary? |
|
Sorry for the slow response @nmalenovic. I agree with @soulofmischief that not truncating the extension is a desirable feature to have built-in. In which case the callback would not be necessary. For fancy transformations, it think it's better to do that before sanitizing the filename, so something like |
|
Hi @soulofmischief - change made, using path.extname instead. Hi @joelmukuthu - as suggested, function callback dropped and options are now truncateLength and preserveFileExt with defaults to preserve backward compatibility and meaning. |
joelmukuthu
left a comment
There was a problem hiding this comment.
Thanks for the updates Nik. I just made a few comments but otherwise I like the idea!
| * `options.truncateLength`: *optional, number, default: `255`*. Used in truncate call | ||
| to reduce string to the specified length. If <=0, returns ''. | ||
|
|
||
| * `options.preserveFileExt`: *optional, boolean, default: `false`*. By default, |
There was a problem hiding this comment.
I think we could default to keeping the file extension and not make it configurable for now.. it's a breaking change either way and I'm not sure anyone will miss the old behaviour :)
There was a problem hiding this comment.
I agree with @joelmukuthu.
I think preserveFileExt can be true by default and that we don't even need it to be an option. Are there use cases where you wouldn't want this behavior?
I think it's ok for the exact strings this module produces to change between minor versions. I would consider changing things like the 255 default max length a breaking change, but tweaks to the exact string produced can be a minor version change.
Open to other perspectives on this.
There was a problem hiding this comment.
Agreed. As long as it produces 255-chatacter (max) filenames, it shouldn't be a breaking change. My only concern was for users who may use this in tests and have hard-coded some strings. All the same, truncating the file extension away is a bug so technically this is a bug fix.
| * Unsafe characters are either removed or replaced by a substitute set | ||
| * in the optional `options` object. | ||
| * | ||
| * Additionally, you can pass alternative truncation length (default: 255) |
There was a problem hiding this comment.
Needs a little update :)
| * | ||
| * @param {String} input Original filename | ||
| * @param {Object} options {replacement: String | Function } | ||
| * @param {Object} options {replacement: String | Function, truncate: String | Function} |
There was a problem hiding this comment.
truncateLength: Number
| var replacement = (options && options.replacement) || ''; | ||
| var output = sanitize(input, replacement); | ||
| var preserveFileExt = (options && options.preserveFileExt) || false; | ||
| var truncateLength = (options && typeof options.truncateLength == typeof 0) ? options.truncateLength : 255; |
There was a problem hiding this comment.
options.truncateLength === 'number' works better than options.truncateLength == typeof 0 here ;)
| .replace(windowsReservedRe, replacement) | ||
| .replace(windowsTrailingRe, replacement); | ||
| return truncate(sanitized, 255); | ||
| return preserveFileExt ? |
There was a problem hiding this comment.
Hmm.. I think we shouldn't touch the extension while truncating. In my view, the algorithm here would be something like:
- split out the extension, if any
- subtract the length of the extension (including the dot) from
truncateLength. If there's no extension, then truncate the entire string - if greater, truncate the filename (without the dot and extension) to the length calculated in the previous step
- re-join the truncated filename with the extension
As a bonus, it would be nice to only truncate if the length of the filename exceeds truncaeLength in both cases
parshap
left a comment
There was a problem hiding this comment.
I like this feature! Thanks for making the contribution.
I agree with the feedback so far on dropping the callback and considering if this can be a default rather than an option.
Not explicitly approving or requesting changes yet as I think there's some open feedback to address, but overall I am excited about landing this change.
| * `options.preserveFileExt`: *optional, boolean, default: `false`*. By default, | ||
| string is trimmed to the `options.truncateLength` length. If true, file extension | ||
| is preserved within the `options.truncateLength` bounds, unless extension is | ||
| longer then `options.truncateLength` then the result is just the extension trimmed | ||
| to the `options.truncateLength` length. Note: extensions is parsed using | ||
| `path.extname` and as such the leading '.' in the extension name, unless it's a | ||
| 'dot-file' in which case the extension is empty. Note that one exception is files | ||
| that end with a "." - such pattern is transformed away using `options.replacement` | ||
| (default: stripped away). No newline at end of file |
There was a problem hiding this comment.
Nice thorough explanation of edge cases. Even if we remove this as an option, I think this information in the readme would be useful.
| }); | ||
|
|
||
| test("variable length, include/exclude file extension", function(t) { | ||
| t.ok(sanitize("01234567.89A", { truncateLength: 8+1+3 }) == "01234567.89A"); |
There was a problem hiding this comment.
Nit: t.equal() produces better error messages and is how other tests are written.
| return preserveFileExt ? | ||
| truncate(sanitized, Math.max(0, truncateLength - extname(sanitized).length)) + truncate(extname(sanitized), truncateLength) | ||
| : truncate(sanitized, truncateLength); |
There was a problem hiding this comment.
Nit: This ternary is a bit hairy. An if/else and some intermediate variables might make it more readable.
if (preserveFileExt) {
var maxBaseLength = Math.max(0, truncateLength - extname(sanitized).length);
var truncatedBase = truncate(sanitized, maxBaseLength);
var truncatedExt = truncate(extname(sanitized), truncateLength);
return truncatedBase + truncatedExt;
} else {
return truncate(sanitized, truncateLength);
}| t.ok(sanitize("\u000001234567.89ABCDEF", { truncateLength: 0, preserveFileExt: true }) == ""); | ||
| t.ok(sanitize("\u0000", { truncateLength: 255, preserveFileExt: true }) == ""); | ||
| t.ok(sanitize("\u00000123.ABCD", { truncateLength: 4, preserveFileExt: true }) == ".ABC"); | ||
| t.ok(sanitize("01234567.89A", { truncateLength: -12 }) == ""); |
There was a problem hiding this comment.
Can you add a test case like the following? I believe there's a bug in the extension logic.
t.ok(sanitize("01234567.89A", { preserveFileExt: true }) == "01234567.89A");|
The library should not guess the extension (this will not work with eg. |
I tried to minimize the amount of additions and to stay in line with the other code, but some unavoidable conflicts will occur with PR#58 that defines additional options.
Motivation is simple - I have to produce less then 255 character filenames, I don't want to strip file extensions, and I like to map extra lengthy sentences into filenames by stripping white-space, punctuation, and vowels to make for compact filenames without loss of information (for humans at least).