Dave Jarvis' Repositories

git clone https://repo.autonoma.ca/repo/segmenter.git

Renamed file for rendering on repository host page.

Authordjarvis <email>
Date2018-12-17 23:58:30 GMT-0800
Commit08c47d6a1211a791c4b656eda29ce369d9650eaa
Parent1030ac5
README.md
+ ================================================
+ Word Split: Text Segmentation Tool
+ ================================================
+
+Word Split is a Java application for Java software developers. The application
+provides a way to split conjoined words that lack punctuation into separate
+words. The software is intended to split database column names into their
+human-readable equivalent text.
+
+Word Split takes the following input files:
+ - a probability lexicon, one word and probability per line (CSV format)
+ - a list of conjoined phrases, one per line
+
+Word Split will use the lexicon to separate the list of conjoined phrases.
+The resulting segmented phrases are written to standard out.
+
+ ---------------------------------------------
+ Contents
+ ---------------------------------------------
+
+This release includes:
+
+ - README.txt This file
+ - LICENSE.txt License for Word Split
+
+ - version.properties Build version
+ - build.xml Build instructions for Ant
+ - demos Example lexicons and conjoined files
+ - scripts Corpus and lexicon helper scripts
+ - scripts/tally-corpus.sh Creates a tally lexicon from a corpus.
+ - scripts/probability.awk Creates a probability lexicon from tallies
+
+ ---------------------------------------------
+ Requirements
+ ---------------------------------------------
+
+The following software packages are required to compile and run Word Split:
+
+ - Java version 1.6 (or greater)
+ - Ant version 1.8.2 (or greater)
+
+ ---------------------------------------------
+ Installation
+ ---------------------------------------------
+
+Installation is complete by unzipping the archive.
+
README.txt
- ================================================
- Word Split: Text Segmentation Tool
- ================================================
-
-Word Split is a Java application for Java software developers. The application
-provides a way to split conjoined words that lack punctuation into separate
-words. The software is intended to split database column names into their
-human-readable equivalent text.
-
-Word Split takes the following input files:
- - a probability lexicon, one word and probability per line (CSV format)
- - a list of conjoined phrases, one per line
-
-Word Split will use the lexicon to separate the list of conjoined phrases.
-The resulting segmented phrases are written to standard out.
-
- ---------------------------------------------
- Contents
- ---------------------------------------------
-
-This release includes:
-
- - README.txt This file
- - LICENSE.txt License for Word Split
-
- - version.properties Build version
- - build.xml Build instructions for Ant
- - demos Example lexicons and conjoined files
- - scripts Corpus and lexicon helper scripts
- - scripts/tally-corpus.sh Creates a tally lexicon from a corpus.
- - scripts/probability.awk Creates a probability lexicon from tallies
-
- ---------------------------------------------
- Requirements
- ---------------------------------------------
-
-The following software packages are required to compile and run Word Split:
-
- - Java version 1.6 (or greater)
- - Ant version 1.8.2 (or greater)
-
- ---------------------------------------------
- Installation
- ---------------------------------------------
-
-Installation is complete by unzipping the archive.
-
Delta47 lines added, 47 lines removed