Welcome to Geeklog, Anonymous Wednesday, October 09 2024 @ 06:20 pm EDT

Geeklog Forums

A better COM_extractLinks


Status: offline

asmaloney

Forum User
Full Member
Registered: 02/08/04
Posts: 214
I noticed a problem with the "What's Related" box - it would only add links up to the first image. So I took a look at the function, made it more efficient, and fixed the problem.

The main difference in the match is that this one doesn't recognize any tags in between <a href=""> and </a>. So, for example, <a href="..."><b>foo</b></a> will not be matched. Maybe I'll futz around with it to handle this if anyone's interested.


function COM_extractLinks( $fulltext, $maxlength = 26 )
{
$rel = array();

preg_match_all( &quot;/(&lt;a href=[^>]+&gtWink([^<]*)(&lt;/a&gtWink/i&quot;, $fulltext, $matches );

for ( $i=0; $i&lt; count( $matches[0] ); $i++ )
{
// if link is too long, shorten it and add ... at the end
if ( ( $maxlength &gt; 0 ) &amp;&amp; ( strlen( $matches[2][$i] ) &gt; $maxlength ) )
{
$matches[2][$i] = substr( $matches[2][$i], 0, $maxlength - 3 ) . '...';
$matches[0][$i] = $matches[1][$i] . $matches[2][$i] . $matches[3][$i];
}

$rel[] = COM_checkHTML( $matches[0][$i] );
}

return $rel;
}


[Note I did not post this in HTML mode because it changed some of the code into smilies...]
 Quote

Status: offline

vinny

Site Admin
Admin
Registered: 06/24/02
Posts: 352
Location:Colorado, USA
In case you're curious, this is what we went with:

Text Formatted Code

function COM_extractLinks( $fulltext, $maxlength = 26 )
{
    $rel = array();

    preg_match_all( "/(<a.*?href="(.*?)".*?&gt<img align=absmiddle src='images/smilies/wink.gif' alt='Wink'>(.*?)(</a&gt<img align=absmiddle src='images/smilies/wink.gif' alt='Wink'>/", $fulltext, $matches );
    for ( $i=0; $i< count( $matches[0] ); $i++ )
    {
        $matches[3][$i] = strip_tags( $matches[3][$i] );
        if ( !strlen( trim( $matches[3][$i] ) ) ) {
            $matches[3][$i] = strip_tags( $matches[2][$i] );
        }

        // if link is too long, shorten it and add ... at the end
        if ( ( $maxlength > 0 ) && ( strlen( $matches[3][$i] ) > $maxlength ) )
        {
            $matches[3][$i] = substr( $matches[3][$i], 0, $maxlength - 3 ) . '...';
        }

        $rel[] = $matches[1][$i] . $matches[3][$i] . $matches[4][$i];
    }

    return( $rel );
}

 


Or you can take a look at it in lib-common.php (without the smilely faces) at: lib-common.php.
 Quote

Status: offline

Blaine

Forum User
Moderator
Registered: 07/16/02
Posts: 1232
Location:Canada
I'm just curious Confused Is there any reason you are not using the [ code ] bb tags when posting code in the forum?

If it is not working well, I'd like to know.
Geeklog components by PortalParts -- www.portalparts.com
 Quote

Status: offline

asmaloney

Forum User
Full Member
Registered: 02/08/04
Posts: 214

Vinny - thanks for posting that. Don't we want a case-insensitive match though?

Blaine - I did that because posting it using CODE translates smilies

e.g.
Text Formatted Code

 preg_match_all( "/(<a href=[^>]+&gt<img align=absmiddle src='images/smilies/wink.gif' alt='Wink'>([^<]*)(</a&gt<img align=absmiddle src='images/smilies/wink.gif' alt='Wink'>/i", $fulltext, $matches );

 
 Quote

Status: offline

Blaine

Forum User
Moderator
Registered: 07/16/02
Posts: 1232
Location:Canada
Yeh, it appear that updates of recent to GL have effected this feature.
Geeklog components by PortalParts -- www.portalparts.com
 Quote

Status: offline

vinny

Site Admin
Admin
Registered: 06/24/02
Posts: 352
Location:Colorado, USA
Blaine,

I put my code snippet in the code tags, but it put the smilely's in there anyway.

Also, I'm sure you noticed but the QUOTE tags are acting funny as well.

-Vinny
 Quote

Status: offline

Blaine

Forum User
Moderator
Registered: 07/16/02
Posts: 1232
Location:Canada
Quote by vinny: I put my code snippet in the code tags, but it put the smilely's in there anyway.

Also, I'm sure you noticed but the QUOTE tags are acting funny as well.


With the geeklog.net upgrade, the allowable HTML was changed. I need the pre tags for the code block formatting. Dirk fixed it a few hours ago. Let's see if that fixed both the quotes and code formatting.
Geeklog components by PortalParts -- www.portalparts.com
 Quote

Status: offline

vinny

Site Admin
Admin
Registered: 06/24/02
Posts: 352
Location:Colorado, USA
I added the case insensitive flag to the regex for COM_extractLinks. (Good catch asmaloney). It should show up in -rc2, and if not there then in the final release of 1.3.9.

-Vinny
 Quote

Anonymous
GL 1.3.9sr1 still has the problem that links are only added up to the first image. Perhaps someone could finally get this messy preg_match_all() sorted out?
 Quote

Status: offline

vinny

Site Admin
Admin
Registered: 06/24/02
Posts: 352
Location:Colorado, USA
I've just tested this is Gl 1.3.9sr1 and it works with the exception of when you have a link like this:

Text Formatted Code

<a href="link1">[image1]</a>

 


which just won't show up in the whats related field, though links after this image still will. I'll work on this last little bug related to COM_extractLinks(). If you can demonstrate another bug, please post a URL so I can see it.

Thanks,
Vinny
 Quote

Status: offline

asmaloney

Forum User
Full Member
Registered: 02/08/04
Posts: 214
I still have this problem too. I have a story with images [which themselves are links to unscaled versions] and none of the links on the page show up in What's Related.

Here's an example from my site.
 Quote

Status: offline

vinny

Site Admin
Admin
Registered: 06/24/02
Posts: 352
Location:Colorado, USA
Your problem has nothing to do with images, you use single quotes instead of double quotes in your links i.e.

Text Formatted Code

<a href='link1'>link1</a>
--instead of--
<a href="link1">link1</a>

 


The HTML spec calls for the use of double quotes. I'll see about accepting both when 1.3.10 is relased though.

-Vinny
 Quote

Status: offline

asmaloney

Forum User
Full Member
Registered: 02/08/04
Posts: 214
Quote by vinny: Your problem has nothing to do with images, you use single quotes instead of double quotes in your links i.e.



Heh. That was one of the first things I checked. Using Firefox if you select some text on the page and use the context menu to 'View Selection Source', it shows double quotes even though the page source shows single quotes... I guess I out-Foxed myself.

Quote by vinny:
The HTML spec calls for the use of double quotes. I'll see about accepting both when 1.3.10 is relased though.


Yet the W3C validator validates them alright.

Thanks for catching that for me.
 Quote

Status: offline

vinny

Site Admin
Admin
Registered: 06/24/02
Posts: 352
Location:Colorado, USA
The next version of Geeklog (1.3.10) will have a COM_extractLinks that supports single quotes and also nested HTML tags (including images, i.e. [imageX]).

-Vinny
 Quote

All times are EDT. The time is now 06:20 pm.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content