Welcome to Geeklog, Anonymous Saturday, September 21 2024 @ 12:21 am EDT
Geeklog Forums
Search API for Plugins
Page navigation
sbarakat
I have spent the last month or so reworking the search area of the Geeklog site as part of my Google Summer of Code project. More details can be found here. The work I am doing has resulted in the need to modify the plugin API system that is involved with the search page. Basically how the plugin_dopluginsearch_() function operates. So this post is really aimed at plugin developers, stop reading now if you are not one, as you may find this a little boring
The plugin_dopluginsearch_() function using the new API system will operate in this way:
1. Build search SQL string using the input parameters.
2. Return the query along with some plugin information.
And that's it. A check is no longer needed to see if type is appropriate for the plugin, when a user searches for a single type only that plugin's function will be executed. Also the execution of the SQL statement is no longer done in the function, instead it is handled by the core.
The new API system requires the following variables to be returned by the plugin_dopluginsearch_ function:
Plugin Name
* A name to display to the user, i.e. 'Static Page' (singular preferred)
* A name used locally to identify the plugin, i.e. 'staticpages'
A search rank
* An integer between 1 to 5 based on how many result to extract from the query
Std. SQL Query
* A string containing a single query
* OR An array of two queries for both mssql and mysql DBMS
FT. SQL Query (optional)
* A string containing a single query using the Full-Text search method
* OR An array of two queries for both mssql and mysql DBMS using the Full-Text search method
The rank is a new concept that has been introduced. All the results have been combined into a single list, so the system needs to differentiate important plugins to non important ones. The higher the ranking the greater number of results will be displayed. The reasoning behind this is when a user makes a search they often want to search the main areas of a site, like the stories or forums. They maybe less concerned with comment or calendar results. So it would be best to provide the user with more results from the main parts of the site. A rank of 1 will show fewer results, where as a rank of 5 shows the most amount of results from the plugin.
All SQL statements returned should be without the LIMIT and ORDER BY clauses. The SQL statement must have the following column names and look like:
SELECT id, title, description, date, user, hits, url FROM ... WHERE ...
The url is where to take the user when they have clicked a result. It should start with a single slash if the target is within the current domain, otherwise it should start with 'http://' or 'www.' if the target is an external one. For example this url belongs to the Geeklog site
CONCAT('/staticpages/index.php?page=', sp.sp_id) AS url
However the Links plugin needs to direct users to external sites when clicked, so providing the url from the database will suffice.
The Full-Text queries are optional and should only be used if the table being searched has been index properly. The Full-Text queries will only be executed if the admin/site maintainer has enabled Full-Text searches. This part is still in the experimental phase so if you are unsure as to what Full-Text indexes are then you can ignore this.
Now that you know what is required how would you prefer to pass this information back to the core? In Geeklog versions prior to 1.5.1 an instance of the Plugin() class would be created, assign the parameters then return the instance. To keep future versions of Geeklog compatible with the old API the Plugin class is not going anywhere any time soon, so we could add some extra paramterts to it and keep it in use. So the process of returning the information using the new API would be as follows:
$sql = 'SELECT sp_id AS id, sp_title AS title, ... FROM ... WHERE ...';
$plugin = new Plugin();
$plugin->setName('staticpages', 'Static Pages');
$plugin->setRank(4);
$plugin->setSQL($sql);
$plugin->setFTSQL($ftsql);
return $plugin;
Or if developers would prefer we could move away from the Plugin() class so that maybe sometime in the future (GL 1.6?) when all plugins have migrated the class can be abandoned. So we can have another class called SearchPlugin, which could work in a similar way:
$plugin = new SearchPlugin();
$plugin->setName('staticpages', 'Static Pages');
$plugin->setRank(4);
$plugin->setSQL($sql);
$plugin->setFTSQL($ftsql);
return $plugin;
Or finally we could just remove the need for a class completely by just returning an associated array with the information.
return array(
'name' => 'staticpages',
'fullname' => 'Static Pages',
'rank' => 4,
'sql' => $sql,
'ftsql' => $ftsql);
This last option seems to be simplest but I suppose there could be problems in the future when some of the parameters need to be changed. So which would you prefer? I have held off documenting the API in the Wiki until a format has been settled on.
Also just a little note, the input parameters to the plugin_dopluginsearch_ function will remain the same to stay compatible with the old API system. Although if using the new API there is no longer a need for the $page and $perpage parameters as the plugin no longer handles the paging or ordering of the results.
jmucchiello
For similar reasons, leave the Plugin class alone for backward compatibility with plugin_dopluginsearch_$plugin. Turning what the plugin returns into an array or using an object is just a style issue. If you go with a class please call it SearchCriteria since that is what is being returned. YMMV. Returning as an array is more PHP4 friendly but is that something we need to still worry about? I don't know.
Also, it would be nice if plugins could return multiple SearchCriteria objects. Remember my comments from when you started. Some plugins maintain more than one type of object. If you are going to take the list of SearchCriteria and UNION all the sql together:
(select ... from plugin1_tbl ...)
union (select .... from plugin2_tbl ...)
union (select ... from plugin3_tbl ...)
then my plugin should return two objects to avoid this SQL:
(select ... from plugin2_tbl ...)
union (select .... from plugin2_tbl1 ... union selelct ...from plugin2_tbl2....)
union (select ... from plugin3_tbl ...)
Hope that helps.
sbarakat
Create a new API function: plugin_advancedsearch_$plugin. Two reasons: 1) eliminates legacy parameters, 2) allows plugins to support old search method and new search method for users who haven't moved to latest/greatest GL immediately. Remember, some users will upgrade a plugin that works on GL x and GL x+1. If GL x is older than your new search code, the plugin writer has a harder time dealing with the change.
When I started the project it was my intention, like you say, to have two separate functions, plugin_dopluginsearch_ for the old API and similarly plugin_advancedsearch_ for the new API. But when put into practice it just seemed to overly complicate things, both in the core and for the plugin. In the future new plugin developers may be confused as to which one to use and I was afraid that if a new function was introduced it would add to more redundant code in the plugins, e.g. if the advancedseach function is found no need to execute the dopluginsearch function. Also there are only 2 legacy parameters. So I figured the function name should stay the same and only the result is analysed to determine the API. When an instance of the Plugin class is returned then extra measures are carried out to extract the data from the class and convert it into something the new API understands. This is currently working, although may still need tweaking, but it allows plugins on the old API to work with the core.
I am trying to sell the new API as "Don't execute the query return it instead!". So it should be fairly simple for developers to make the modifications although I understand that there may be some strange circumstances, which is what I am trying to find out in this post.
To handle users running newer plugins (on the new API) on older versions of GL a simple version test could decide what to do. If the SearchCriteria class exists then use it otherwise use the Plugin class. But how often do cases like this arise? If a user is running a plugin that was designed for GL x on GL x-1 I'm sure they will run into greater compatibility problems besides the search.
For similar reasons, leave the Plugin class alone for backward compatibility with plugin_dopluginsearch_$plugin. Turning what the plugin returns into an array or using an object is just a style issue. If you go with a class please call it SearchCriteria since that is what is being returned. YMMV. Returning as an array is more PHP4 friendly but is that something we need to still worry about? I don't know.
Agreed. The only worry I had with returning a new class is in the future it may turn out the same way the Plugin class has, just a container for data.
Also, it would be nice if plugins could return multiple SearchCriteria objects. Remember my comments from when you started. Some plugins maintain more than one type of object. If you are going to take the list of SearchCriteria and UNION all the sql together:
The UNION idea has been dropped, after doing the testing it gave substantially longer execution times. Each query is now executed individually one after the other appending the results to an array before being displayed. Would there really be a need to return multiple SearchCriteria objects? If a plugin is being searched that needs to pull data from multiple tables it can be done using joins. Can you name any plugins that maintain more than one type of object so I can see how to best accommodate them?
Blaine
There are many plugins that still work just fine but are not actively supported.
This may not be an issue, if your also suggesting that you will be updating all known plugins
We really should not break existing API's or functionality with any new upgrades - that should be the last option we consider.
Geeklog components by PortalParts -- www.portalparts.com
jmucchiello
The UNION idea has been dropped, after doing the testing it gave substantially longer execution times.
Can you name any plugins that maintain more than one type of object so I can see how to best accommodate them?
sbarakat
Would it be possible to provide a API function wrapper that would support older plugins calling the current search API - but calls the new API so we don't have issues with older plugins?
There are many plugins that still work just fine but are not actively supported.
This may not be an issue, if your also suggesting that you will be updating all known plugins
We really should not break existing API's or functionality with any new upgrades - that should be the last option we consider.
I had said that I will be upgrading the plugins that are included with GL: calendar, links and staticpages. And when you include the stories and comments, that makes the default setup compatible with the new API. These plugins have already been upgraded.
If you want to see the backwards compatibility in action pull the latest snapshot from Mercurial and install an old plugin, you may notice some missing fields, but thats how it works. Let me know if you come across any plugins that just flat out dont work, as these are the ones I am trying to narrow down. The legacy support is provided by the doSearch() function in the search.class file.
The issue here is like jmucchiello says what happens when a user runs the latest plugin (which returns SearchCriteria) on an old version of Geeklog that only supports Plugin.
My thoughts is that these updates will be included in GL 1.5.1, the updates will not break anything with the plugins currently out there. Obviously it will take some time for plugins to migrate to the new API and when they do they can reap the extra benefits. But when a plugin is upgraded the developer does so to support advances in the GL core, as well as its own updates. So the plugin's minimum requirements becomes GL 1.5.1. Where as the core supports plugins that were made since GL 1.3x
If the developer wants to allow their plugins to run on older versions of GL they should provide backwards compatibility, wouldn't you agree? The core provides compatibility for legacy plugins, shouldn't plugins then support legacy versions of GL?
And as I said for a new plugin to provide support for older versions of Geeklog would just require a simple test. The process would be as follows:
1. Build the SQL string using the input parameters.
2a. If SearchCriteria exists (or if GL version is greater than 1.5.0) then initialise it and return the instance.
2b. Other wise carry-on as described in the Wiki.
In theory the calendar plugin should search both events and personalevents when looking for matches. But it doesn't.
jmucchiello
The current calendar plugin works with the new API.
Blaine
Yes older plugins are supported, and have been since I started the project. The search page is backwards compatible.
Uh good
Now to your other original question, I would prefer option 2 so we have a new plugin class like:
$plugin = new SearchPlugin(); but have questions since I am not yet sure how the returning results will look when older and updated plugin results are combined. Do you have some screen shots to show us - as most of us don't have the time to install and sort out issues with yet another instance of new code.
As I recall you have a large DB sample to run/test - so could you generate some screenshots showing:
- several plugins returning results in compatibility mode (as it is now)
- several plugins returning results in new search result mode only
- search results page with combination of older and newer search formatting returns
- plugin specific 'advanced mode - single plugin' showing newer api version formatted returns.
If there are several options for formatted returns - show us examples.
Geeklog components by PortalParts -- www.portalparts.com
sbarakat
The current calendar plugin works with the new API.
Ahh yes sorry I see what you mean now. I will look into this and find a way to make it compatible with the new API. If you discover any other plugins that may also have a problem please let me know.
As I recall you have a large DB sample to run/test - so could you generate some screenshots showing:
- several plugins returning results in compatibility mode (as it is now)
- several plugins returning results in new search result mode only
- search results page with combination of older and newer search formatting returns
- plugin specific 'advanced mode - single plugin' showing newer api version formatted returns.
If there are several options for formatted returns - show us examples.
Yeah that does seem a better idea I will get on it and post them up soon.
sbarakat
The configuration options.
All the plugins using the new API
* Table style (sorting is done by clicking a column heading)
* Google style
All the plugins using the new API with "Display Result Number?" and "Display Result Type?" set to false.
* Table style
* Google style
Static Pages and Links plugins are using the old API, while Stories and Comments are using the new API
* Table Style
* Google style
As you can see plugins that are using the old API will have the available data extracted then added to the list. There are still a couple of issues that still need sorting out with the backwards compatibility. For example you will notice the total results found is not accurate when there are legacy plugins. Also I just noticed while taking these screenshots that when using the advanced search options to only search the Links plugin (while its running on the old API) it only spurts out a handful of results and does not include the paging at the bottom. These bugs will soon be fixed but at least this gives you an idea of how the results page looks, and how flexible it can be.
jmucchiello
When the description displays as "not available" is that because the plugin returned not available or returned an empty string and the search code put not available in its place?
Dirk
Here are the screenshots.
They're looking great 8)
The only irritating thing is the "not available" text for when, I assume, the plugin didn't return any content to display. That should be toned down a bit or put in brackets or something to make clear it's not the actual content. But I'm sure you'll think of something.
Looking forward to having that on my Geeklog sites.
bye, Dirk
sbarakat
Why are the dates formatted without years? Is that your user's date format or is that format chosen by the search code?
When the description displays as "not available" is that because the plugin returned not available or returned an empty string and the search code put not available in its place?
sbarakat
The only irritating thing is the "not available"
hehe I missed your post. The "not available" can be changed quite easily. it is using one of the $LANG09 variables. How about displaying it in brackets and italics? Or it could even be replaced with a dash '-' or some other text. Or I could add it to the configuration options allowing the admin to decide how to display it. Suggestions?
Also are the configuration options enough? Because more could be added like, Default sort by, Disable sort, Disable limits etc. I didnt want to go overboard here but it is quite easy to add the extra options if people want them.
randy
The only irritating thing is the "not available"
hehe I missed your post. The "not available" can be changed quite easily. it is using one of the $LANG09 variables. How about displaying it in brackets and italics? Or it could even be replaced with a dash '-' or some other text. Or I could add it to the configuration options allowing the admin to decide how to display it. Suggestions?
Also are the configuration options enough? Because more could be added like, Default sort by, Disable sort, Disable limits etc. I didnt want to go overboard here but it is quite easy to add the extra options if people want them.
I second the "lookin' good" Sami. The particularly like the google-like output.
I would say make anything that seems like it should/can be configurable be configurable. So max # of pages, default sort, all header language, etc etc.
This way anyone can skin it the way they'd like.
sbarakat
The current calendar plugin works with the new API.
Ahh yes sorry I see what you mean now. I will look into this and find a way to make it compatible with the new API. If you discover any other plugins that may also have a problem please let me know.
I have spent the last couple of hours looking into this issue and I cannot work out why there are two tables in the first place. Both events and personalevents have exactly the same column names with the same structure, with the exception personal_events has an extra 'uid' column and events has an extra 'hits' column. The events table also has some extra indexes. But apart from the two columns and the extra indexes both tables are _exactly_ the same.
Which makes me question why not simplify things and put it all into a single table, 'owner_id' in the events table can act as 'uid' from the personal table and an extra column could be added to flag the event as personal or public. This would be a fairly simple modification.
Having two identical tables just seems to me as bad design...unless there is something I'm missing here. Having everything in a single table will make the searching a lot simpler.
So far I am not convinced that the search API should handle multiple objects being returned, as it would just add unnecessary processing that should be avoided by having better designed tables and SQL statements.
jmucchiello
Having two identical tables just seems to me as bad design...unless there is something I'm missing here. Having everything in a single table will make the searching a lot simpler.
So far I am not convinced that the search API should handle multiple objects being returned, as it would just add unnecessary processing that should be avoided by having better designed tables and SQL statements.
eg0master
I also agree that a plugion should be able to return several different types of items. For example my FAQ plugin could be able to return FAQ entries and FAQ categories as different object. I also have a number of (currently) non-public plugins that definitly have more than one type of object since they handle different, but related things in the same plugin.
If you're rewriting the search API it feels like a pretty bad design limiting the each plugin to only one type of search item. It's almost as bad as if each plugin only could add one menu item to the admin/user menu...
Geeklog Plugins: http://plugincms.com
sbarakat
I agree the screenshots looks nice. And regarding "nothing available" I would definitly want to customize that. Actually it makes most sense to me to show nothing at all (i.e. an empty string).
Already done 8) It will be a config option users can type in what they want or nothing at all.
I also agree that a plugion should be able to return several different types of items. For example my FAQ plugin could be able to return FAQ entries and FAQ categories as different object. I also have a number of (currently) non-public plugins that definitly have more than one type of object since they handle different, but related things in the same plugin.
If you're rewriting the search API it feels like a pretty bad design limiting the each plugin to only one type of search item. It's almost as bad as if each plugin only could add one menu item to the admin/user menu...
Thanks for this.
The main reason I was against supporting multiple objects is that as now only the SQL statement is returned it forces plugin developers to optimize their queries giving better performance overall in the search pages. As adding a join is better than using the id from sql_1 to lookup sql_2 to return the result from sql_3....bad example but I hope you see where I'm going with this. The calendar plugin is somewhat an example of what I mean.
It only requires 1 line of code being changed in the core to support this, so its not as if its a difficult task. I just wanted to be certain that it was absolutely necessary as I didnt want to encourage any badly designed sql queries. As jmucchiello and yourself have pointed out there may be some cases where it does seem necessary.
So scratch whats written earlier, the plugin can either return a single object or multiple ones, like so...
$plugin_ent = new SearchPlugin();
$plugin_ent->setName('faq', 'FAQ > Entries');
$plugin_ent->setRank(4);
$plugin_ent->setSQL($sql_ent);
$plugin_ent->setFTSQL($ftsql_ent);
//We could return just this
//return $plugin_ent;
$plugin_cat = new SearchPlugin();
$plugin_cat->setName('faq', 'FAQ > Categories');
$plugin_cat->setRank(4);
$plugin_cat->setSQL($sql_cat);
$plugin_cat->setFTSQL($ftsql_cat);
// Or if need be return both
return array($plugin_ent, $plugin_cat);
But note that the first parameter in the setName(), in this case 'faq', must stay the same across all objects.
It would also be preferable if plugins use the same separator to distinguish the result sub group. As seeing something like this on the results page may not look that great:
Story
Comment
FAQ > Entry
FAQ > Entry
Calandar - Personal
Name -> Sub Group
So shall we say that if a sub group is going to be used then separate each item with the pipe character, 'FAQ|Entries', then a simple find and replace can be performed allowing the admin to choose the separator they want from the config pages.
jmucchiello
But note that the first parameter in the setName(), in this case 'faq', must stay the same across all objects.
It would also be preferable if plugins use the same separator to distinguish the result sub group. As seeing something like this on the results page may not look that great:
Story
Comment
FAQ > Entry
FAQ > Entry
Calandar - Personal
Name -> Sub Group
So shall we say that if a sub group is going to be used then separate each item with the pipe character, 'FAQ|Entries', then a simple find and replace can be performed allowing the admin to choose the separator they want from the config pages.
Make the second parameter to setName accept an array and have the search API supply the separator from a config option. Also remember most of that text will come from the langauage arrays.
$plugin_cat->setName('story', 'Story';
$plugin_cat->setName('comment'. 'Comment';
$plugin_cat->setName('faq', array('FAQ', 'Entry');
$plugin_cat->setName('calendar', array('Calandar', 'Personal');
$plugin_cat->setName('plugin', array($LANG_PL00['group'], $LANG_PL00['subgroup']));
You're code can just implode the thing.
Page navigation
- Normal Topic
- Sticky Topic
- Locked Topic
- New Post
- Sticky Topic W/ New Post
- Locked Topic W/ New Post
- View Anonymous Posts
- Able to post
- Filtered HTML Allowed
- Censored Content