-
Notifications
You must be signed in to change notification settings - Fork 9
Expand file tree
/
Copy pathindex.html
More file actions
105 lines (105 loc) · 4.88 KB
/
index.html
File metadata and controls
105 lines (105 loc) · 4.88 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>datatools</title>
<link rel="stylesheet" href="https://caltechlibrary.github.io/css/site.css">
<link rel="stylesheet" href="https://media.library.caltech.edu/cl-webcomponents/css/code-blocks.css">
<script type="module" src="https://media.library.caltech.edu/cl-webcomponents/copyToClipboard.js"></script>
<script type="module" src="https://media.library.caltech.edu/cl-webcomponents/footer-global.js"></script>
</head>
<body>
<header>
<a href="https://library.caltech.edu"><img src="https://media.library.caltech.edu/assets/caltechlibrary-logo.png" alt="Caltech Library logo"></a>
</header>
<nav>
<ul>
<li><a href="/">Home</a></li>
<li><a href="index.html">README</a></li>
<li><a href="LICENSE">LICENSE</a></li>
<li><a href="INSTALL.html">INSTALL</a></li>
<li><a href="user_manual.html">User Manual</a></li>
<li><a href="about.html">About</a></li>
<li><a href="search.html">Search</a></li>
<li><a href="https://github.com/caltechlibrary/datatools">GitHub</a></li>
</ul>
</nav>
<section>
<h1 id="datatools">datatools</h1>
<p><em>datatools</em> is a rich collection of command line programs
targeting data conversion, cleanup and analysis directly from your
favorite POSIX shell or Powershell. It has proven useful for data
collaborations where individual members of a project may prefer
different tool sets in their analysis (e.g. Julia, R, Python) but want
to work from a common baseline. It also has been used intensively for
internal reporting from various Caltech Library metadata sources.</p>
<p>The tools fall into three broad categories</p>
<ul>
<li>data transformation and conversion</li>
<li>shell scripting helpers</li>
<li>“string”, a tool providing the common string operations missing from
shell</li>
</ul>
<p>See <a href="user-manual.html">user manual</a> for a complete list of
the command line programs. The data transformation tools include support
for formats such as Excel XML, csv, tab delimited files, json, yaml,
toml and url encoding/decoding.</p>
<p>Compiled versions of the datatools collection are provided for Linux
(aarch64/amd64), Mac OS X (aarch64/amd64), Windows 10 (aarch64/amd64)
and Raspberry Pi OS (aarch64). See
https://github.com/caltechlibrary/datatools/releases.</p>
<p>Use “-help” option for a full list of options for each utility
(e.g. <code>csv2json -help</code>).</p>
<h2 id="data-transformation">Data transformation</h2>
<p>The tooling around transformation includes data conversion. These
include tools that work with CSV, tab delimited, JSON, TOML, YAML, Excel
XML, and url encoded text.</p>
<p>There is also tooling to change data shapes using JSON as the
intermediate data format.</p>
<h2 id="for-the-shell">For the shell</h2>
<p>Various utilities for simplifying work on the command line.</p>
<ul>
<li><a href="docs/mergepath/">mergepath</a> - prefix, append, clip path
variables</li>
<li><a href="reldocpath.1.html">reldocpath</a> - calculates relative
paths given do paths</li>
<li><a href="docs/range/">range</a> - emit a range of integers (useful
for numbered loops in Bash)</li>
<li><a href="docs/reldate/">reldate</a> - display a relative date in
YYYY-MM-DD format</li>
<li><a href="docs/reltime/">reltime</a> - display a relative time in 24
hour notation, HH:MM:SS format</li>
<li><a href="docs/timefmt/">timefmt</a> - format a time value based on
Golang’s time format language</li>
<li><a href="docs/urlparse/">urlparse</a> - split a URL into parts</li>
</ul>
<h2 id="for-strings">For strings</h2>
<p><em>datatools</em> provides the <a href="docs/string/">string</a>
command for working with text strings (limited to memory available).
This is commonly needed when cleanup data for analysis. The
<em>string</em> command was created for when the old Unix standbys-
grep, awk, sed, tr are unwieldy or inconvenient. <em>string</em>
provides operations are common in most language like, trimming,
splitting, and transforming letter case. The <em>string</em> command
also makes it easy to join JSON string arrays into single a string using
a delimiter or split a string into a JSON array based on a delimiter.
The form of the command is
<code>string [OPTIONS] [ACTION] [ACTION_PARAMETERS...]</code></p>
<pre class="shell"><code> string toupper "one two three"</code></pre>
<p>Would yield “ONE TWO THREE”.</p>
<p>Some of the features included</p>
<ul>
<li>change case (upper, lower, title, English title)</li>
<li>length, position and count of substrings</li>
<li>has prefix, suffix or contains</li>
<li>trim prefix, suffix and cutsets</li>
<li>split and join to/from JSON string arrays</li>
</ul>
<p>See <a href="docs/string/">string</a> for full details</p>
<h2 id="installation">Installation</h2>
<p>See <a
href="https://caltechlibrary.github.io/datatools/INSTALL.html">INSTALL.md</a>
for details for installing pre-compiled versions of the programs.</p>
</section>
<footer-global></footer-global>
</body>
</html>