Naanou.Common.Search
Class Distiller

Distills metadata or a string for uniform indexing and querying, e.g changing 'You had a great time on that horse' to 'great time horse'

Field Summary
System.Globalization.CultureInfo culture
double FILE_WEIGHT
         If percentage of stopwords is under this amount, take them out
double META_WEIGHT
         If percentage of stopwords is under this amount, take them out
string[] stopWords

Constructor Summary
Distiller()
        Initializes a new instance of the Distiller class.

Method Summary
string[] DistillFilename(string filename)
         Distills a filename into a set of words and hashes them.
string DistillMeta(string key, string val)
         Distills a metadata value by taking out non-critical information, and hashes key:val
string[] MarkAndSplit(string source, int[] marks, int marked)
         Splits up source text into words and marks words that are stop words or non-alphabetical text.
string[] SplitWords(string words)
         Splits a text string into words

Methods inherited from class System.Object
Equals, Finalize, GetHashCode, GetType, MemberwiseClone, ToString


Field Detail

culture

private System.Globalization.CultureInfo culture


FILE_WEIGHT

private double FILE_WEIGHT

If percentage of stopwords is under this amount, take them out


META_WEIGHT

private double META_WEIGHT

If percentage of stopwords is under this amount, take them out


stopWords

private string[] stopWords

Constructor Detail

Distiller

private Distiller()

Initializes a new instance of the Distiller class.

Method Detail

DistillFilename

public string[] DistillFilename(string filename)

Distills a filename into a set of words and hashes them.

Example: Morcheeba - Big Calm: 01 The Sea.mp3 Would get converted into {morcheeba, big, calm, sea}

Parameters:
filename - Filename
Returns:
Filtered words of filename

DistillMeta

public string DistillMeta(string key,
                          string val)

Distills a metadata value by taking out non-critical information, and hashes key:val

Parameters:
val - Value
key - Key
Returns:
Distilled value

MarkAndSplit

private string[] MarkAndSplit(string source,
                              int[] marks,
                              int marked)

Splits up source text into words and marks words that are stop words or non-alphabetical text.

Parameters:
source - Source text
marks - Marks (0 is a word, 1 is a stop word, 2 is punctuation
marked - How many marks where made
Returns:
All words

SplitWords

private string[] SplitWords(string words)

Splits a text string into words

Parameters:
words - Text
Returns:
Words