Class TextIndexer

Index Html into keywords.

Inheritance
System.Object
TextIndexer
Inherited Members
System.Object.ToString()
System.Object.Equals(System.Object)
System.Object.Equals(System.Object, System.Object)
System.Object.ReferenceEquals(System.Object, System.Object)
System.Object.GetHashCode()
System.Object.GetType()
System.Object.MemberwiseClone()
Namespace: EPiServer.Core.Html
Assembly: EPiServer.dll
Version: 8.11.0
Syntax
public class TextIndexer

Constructors

TextIndexer()

Declaration
public TextIndexer()

Methods

IndexHtml(HtmlParser)

Index Html into keywords.

Declaration
[Obsolete("This method is no longer being used to index HTML into words, indexing is now handled through the Indexing Service")]
public static WordCollection IndexHtml(HtmlParser htmlParser)
Parameters
Type Name Description
HtmlParser htmlParser

HtmlParser with the parsed html code to index

Returns
Type Description
WordCollection

A collection of keywords

StripHtml(String, Int32)

Strip all HTML elements from a HTML string and return text.

Declaration
public static string StripHtml(string htmlText, int maxTextLengthToReturn)
Parameters
Type Name Description
System.String htmlText

A HTML string

System.Int32 maxTextLengthToReturn

Max string length to return. 0 returns all text in the HTML string.

Returns
Type Description
System.String

A string with text

Remarks

Note that maxTextLengthToReturn will count a HTML entity as one character.

If the stripped text is longer than the specified max length, the string will be truncated and "..." appended at the end.

The algorithm in StripHTML works as follows:

Remove all tags, replacing them with a space. All whitespaces are replaced by space. All consecutive whitespaces are merged into one space.

StripHtml(String, Int32, Int32, String)

Strip all HTML elements from a HTML string and return text.

Declaration
public static string StripHtml(string htmlText, int maxTextLengthToReturn, int maxWordSize, string moreTextMarker)
Parameters
Type Name Description
System.String htmlText

The HTML text.

System.Int32 maxTextLengthToReturn

The max text length to return.

System.Int32 maxWordSize

Maximum word size osed when placing the moreTextMarker at end of string.

System.String moreTextMarker

The more text marker.

Returns
Type Description
System.String

A string without HTML markup.

Extension Methods