Welcome to Geeklog, Anonymous Saturday, June 15 2024 @ 01:09 am EDT

Geeklog Forums

COM_getTextContent issue

Status: offline


Forum User
Full Member
Registered: 01/14/05
Posts: 1569
Location:la rochelle, France
When text contains accented characters, COM_getTextContent replaces them by ?

Text Formatted Code
* Turn a piece of HTML into continuous(!) plain text
* This function removes HTML tags, line breaks, etc. and returns one long
* line of text. This is useful for word counts (do an explode() on the result)
* and for text excerpts.
* @param    string  $text   original text, including HTML and line breaks
* @return   string          continuous plain text
function COM_getTextContent($text)
    // replace <br> with spaces so that Text<br>Text becomes two words
    $text = preg_replace('/\<br(\s*)?\/?\>/i', ' ', $text);

    // add extra space between tags, e.g. <p>Text</p><p>Text</p>
    $text = str_replace('><', '> <', $text);

    // only now remove all HTML tags
    $text = strip_tags($text);

    // replace all tabs, newlines, and carrriage returns with spaces
    $text = str_replace(array("\011", "\012", "\015"), ' ', $text);

    // replace entities with plain spaces
    $text = str_replace(array('&#20;', '&#160;', '&nbsp;'), ' ', $text);

    // collapse whitespace
    $text = preg_replace('/\s\s+/', ' ', $text);

    return trim($text);

How can we solve this?

I'm available to customise your themes or plugins for your Geeklog CMS

Status: offline


Forum User
Registered: 05/19/10
Posts: 136
Quote by: %3A%3ABen

How can we solve this?

Try commenting out the str_replace and preg_replace function calls one ata time and monitor the output, so you can narrow it down to which exact function call(s) cause the problem. Then it should be easier to come up with a fix.

BTW, shouldn't this be a bug report in the tracker?


All times are EDT. The time is now 01:09 am.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content