Dec 08 2009

An Exercise in Wordpress Integration, or Why Wordpress Sucks

Published by blake under Code, PHP, Software Engineering

I'd like to prefix my upcoming rant with the fact that Wordpress is good at what it does: making basic blogs and publishing content. I use it, many other people use it, it works. Heck, I'm using it right now. But from a technical standpoint, Wordpress sucks. I'm going to relate my experience here trying write a quick function to store post output to a file, to be used by a separate application on the same server.

I started off to write a function (let's call it a caching function for simplicity) that stores some HTML from the most recently published post. Sounds easy enough. I should be able to just put a function into the functions.php file of the custom template set I'm using. That's seems to be where the "userland" custom functions go.

So I check the function reference first. Hey, wp_get_recent_posts(). Looks promising, so I give it a shot. It goes ahead and gets the most recent post just fine. Things are ok so far.

A problem appears

storm-at-sea

Now, I want to output the post exactly as it would appear in the blog, and save that output to a file on disk. Surely there's a basic function that will output a post's content? You know... take the post_content field from the database record and format it properly? Suddenly, the skies darken. Evil laughter booms out. Ha ha ha! Wordpress mocks the folly of simplistic functional thinking!

The template files use functions like the_content() and the_title().  Just in case you can't tell from the excellent naming scheme, these actually produce echoed output. Checking out the_content(), we see it dutifully calls get_the_content(), then runs a couple of lines of formatting stuff on the results.  So how about using get_the_content() for my caching function? I could run the other few formatting bits manually after that. Should be ok, right? After all, the doc comment for get_the_content() says the following:

/**
* Retrieve the post content.
*

So, I can go ahead assume it simply retrieves the basic post content then? Ha ha. NO. WHY WOULD IT DO THAT? Instead, it takes a bunch of globals that get set who-the-hell-knows-where, runs through a bunch of crap seemingly unrelated to the content of a post, and does a whole lot of textual modifications to some kind of content. Reading through the function is like jabbing red-hot fire pokeys into your eyes. Here's a portion of it:

PHP:
  1. $content = $pages[$page-1];
  2. if ( preg_match('/<\!--more(.*?)?-->/', $content, $matches) ) {
  3. $content = explode($matches[0], $content, 2);
  4. if ( !empty($matches[1]) && !empty($more_link_text) )
  5. $more_link_text = strip_tags(wp_kses_no_null(trim($matches[1])))</code>
  6.  
  7. $hasTeaser = true;
  8. }

$pages is some kind of global that doesn't seem to have any relation to a post. Then apparently we're looking for HTML comments of <!--more something -->, and replacing them with... well, something. I'd hate to think what would happen if I ever wrote a post with an HTML comment in it that happened to hit on whatever random content markers Wordpress has decided to use. (Oh wait! That just happened to me while I was trying to publish the above code fragment!) I didn't even bother to look into things like wp_kses_no_null. It probably involves dark rituals with live chicken sacrifice. Why is there so much going on in a function called get_the_content()?

In the end, it seems that get_the_content() will eventually get the content of a post, but only if you set a half-a-dozen or so globals before you call it.  And what the hell post is it even getting?

"The Loop"

the-broken-chain1

Digging further, it's clear that the template functions for output are all like that. They don't take any kind of parameters; they just operate on globals! There's no way to take the post data that I just retrieved with wp_get_recent_posts(), and format it using these functions. You have to be in "The Loop" in order to do that. And "The Loop" sucks. It's not a catchy, easy-to-use method of handling posts, despite Wordpress's efforts to pass it off as something neat or fun. It's a mish-mash of global functions with random naming and variable schemes (incidentally, just like the rest of Wordpress). You can only use "The Loop" if you're accessing Wordpress in a "normal", web-requested-and-template-loaded kind of way. It doesn't work if you're outside a template file (such as in functions.php before a template gets loaded).

So back to square one. Unfortunately, it appears that if I want to have the regular blog-formatted output, I need to harness "The Loop" somehow, and clearly you can't do that on your own (ie. outside of a template file) without knowing about every global variable in the system.

After some quick googling, I came across the query_posts() function, which you can use to set up "The Loop". Reading the documentation on it, you can find this little gem:

"The query_posts function overrides and replaces the main query for the page. To save your sanity, do not use it for any other purpose."

To paraphrase: "We've created a public API function that is pretty much useless except in a very specific page-dependent situation. Please enjoy how useless it is.  But don't use it."

The fact that there is a "main query" for a page is another indicator of just how global-happy Wordpress is, and that in turn gives you an insight into why it has so many security holes. How do you keep track of so many globals across so many functions?

A solution... sort of.

Fortunately, the query_posts() doc page links to the WP_Query docs, which is marginally more helpful, and provides the path for a solution. Using WP_Query sets up the wacky global stuff necessary to use "The Loop", which means we can hack our way through to getting some formatted post content. While technically feasible, you have to emulate a bunch of $_REQUEST parameters to the query() method.  I ended up with this:

PHP:
  1. function cacheMostRecentPost()
  2. {
  3. $featuredPosts = new WP_Query();
  4. $featuredPosts-&gt;query('showposts=1');
  5. while ($featuredPosts-&gt;have_posts())
  6. {
  7. $featuredPosts-&gt;the_post();
  8. //do output with stuff like the_title() and the_content()
  9. $str = ob_get_contents();
  10. //write $str to cache fragment
  11. }
  12. }</code>
  13.  
  14. //set up hooks for this file when a post is changed or deleted
  15. add_action('save_post', 'cacheMostRecentPost');
  16. add_action('deleted_post', 'cacheMostRecentPost');

So despite relying on a specific set up of incoming HTTP parameters (as a string) for the most part, at least you can pass paramters to the query if you know the right ones. In this case, "showposts=1" seems to be the total number of posts fetched, and they appear to come back ordered by posting date, most recent first. This works for what I want it to do, but guess what? It doesn't work if you try to run it anywhere that's not one of those action hooks, because "The Loop" overwrites all the globals necessary for doing output later! So I can't use that function, say, at the top of the index.php template file if I wanted to. If I do, thanks to the overwritten globals, Wordpress decides that I actually want the "Archive" page instead of the index page(!), and switches templates accordingly. So while I achieved my goal of being able to cache a post to a file with this function, it's certainly not portable, and it's certainly not elegant.

Wharrgarbl

wharrgarbl

The entire code flow is mind-boggling. Basing the output functions around a bunch of globals reminds me of code someone would have written in PHP 3 a decade ago, or something a very inexperienced programmer would write. Definitely not something you would expect in an application used by what is probably now millions of people. What's wrong with having some data fetching functions, and some output functions? You could, and I know I'm talking crazy here, but you could fetch some data, and then pass it to the output functions. Then (bear with me here), you could probably fetch posts (or whatever) at any time, and get some formatted output at any time, without overwriting some important global that might be used later in the code flow. Revolutionary, I know. Sorry if I went too fast on that. I'll repeat it louder and/or slower for any Wordpress core developers that happen to be reading.

So, Wordpress? How about something like:

$postObjects = getRecentPostsByDate(1);
$output = formatPostContent($postObject[0]);

The mere concept of having individual posts exist inside their own little encapsulated world would make the APIs a hundred times more useful (and easier to understand). You could even keep those crap the_title() and the_content() and the_something_lol_naming_scheme_lol() functions if you wanted. Just make them take parameters. Better yet, put them inside a formatting object, or even the post object itself. $post->the_content() would still work, but it would have context!

The reason this gets me worked up is not that it's so frustrating to use (although that helps). I've had to deal with a lot of frustrating code in my career. It's more the fact that it's this kind of thing that gives PHP programmers a bad name. The code is just bad. The design is random. The API functions are random. The naming schemes are random. Functions don't do what their name (or their doc comment) indicates they should do. Integrating wordpress into another application or site is next to impossible (try it, I dare you), and the other way around, integrating another application or site into wordpress is much more difficult than it should be. Global usage is rampant and ridiculous to follow.

You don't have to look any farther than a single Wordpress code file to understand why there have been so many security holes over the last couple of years. And there's a lot of PHP code out there that's the quality of Wordpress, or worse.

To re-iterate my opening, if you don't need to get anything special out of it, Wordpress does the job. They've filled their market niche well, and it's encouraging that development is ongoing and releases occur often. I've worked with it on occasion over the last few years, and the improvements are obvious, interface-wise especially, and to some extent code-wise as well (the WP_Query object is a step forward).  But working with the code is not fun.  Even modifying the template files is an exercise in counter-intuitiveness.

I'm sure there are reasons the code is what it is at this point, and I'm equally as sure I don't have the full picture to go with my condemnations. I guess I should just be thankful that I don't have to maintain it.

  • Digg
  • del.icio.us
  • DotNetKicks
  • Slashdot
  • StumbleUpon

14 responses so far

Aug 06 2009

ASP.Net MVC – How to route to images or other file types

Published by morgan under .Net, ASP.Net, ASP.Net MVC, Code, MVC

A recent question on Stack Overflow (and subsequent answer that I wrote for it) inspired this post. I had recently been discussing URL rewriting in depth with my brother, and have also been doing some introductory work with the routing engine in ASP.Net MVC, and the question piqued my interest since I had been meaning to look at this more closely for some time.

The question on Stack Overflow is titled "How do I route images with ASP.Net MVC", but fundamentally the question is really asking "how can I use ASP.Net MVC to re-route URL's to actual physical files, rather than methods of a controller?"

To be clear, lets address the conceptual differences between routing and url rewriting.  Url rewriting takes the requested URL and modifies it before your code ever sees it.  As far as your application is concerned, the client requested the rewritten URL.  All that URL rewriting does is to change one URL into another URL, based on pattern matching.

Routing is a different and much more powerful beast.  The ASP.Net routing engine maps an URL to a "resource", based on a set of routes.  The first route to match the requested URL wins the prize, and sends the request off to the resource it chooses.  For the ASP.Net MVC framework (which uses System.Web.Routing under the hood), a resource is something that can handle the request object, which is always a piece of code.

So where does that leave physical files?  If a request is always parsed by the routing engine and then handed off to some function somewhere, how can we ever route a request for an image to actually return the physical image?

Well, it takes a tiny bit of legwork, but once we're through it, I'm confident you will see the huge advantages that routing has over simple url-rewriting.  We will show the equivalent of url-rewriting by handling a request for an image using an URL that doesn't map to a physical path, but be able to return the image anyway.

Handling the Request

First off, we need to handle the request that we want to re-route to a physical file.  Out of the box, ASP.Net MVC uses an instance of the MvcRouteHandler object to handle every request.  MvcRouteHandler hides all the complexities of taking the requested URL, breaking it down into parts, finding the right controller in your application, instantiating it and passing it all the data it needs.

The end result of MvcRouteHandler is not what we desire. We want to return an image, not instantiate a controller and run a method.   We want to skip dealing with controllers altogether in this case.  So lets create our own route handler that we'll use instead.

To do so, we simply implement IRouteHandler, an interface exposed by ASP.Net MVC that actually inherits from IHttpHandler.  This means that what we're writing is the ASP.Net MVC equivalent of an .ashx file for a webforms app - we're inserting our own handling module into the ASP.Net pipeline, that will handle the request much closer to the webserver/http level, rather than at the ASP.Net application level.

IRouteHandler only has one method that we need to implement, which is GetHttpHandler().

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Web;
using System.Web.Compilation;
using System.Web.Routing;
using System.Web.UI;

namespace MvcApplication1
{
    public class ImageRouteHandler : IRouteHandler
    {
        public IHttpHandler GetHttpHandler(RequestContext requestContext)
        {
            string filename = requestContext.RouteData.Values["filename"] as string;

            if (string.IsNullOrEmpty(filename))
            {
                requestContext.HttpContext.Response.Clear();
                requestContext.HttpContext.Response.StatusCode = 404;
                requestContext.HttpContext.Response.End();
            }
            else
            {
                requestContext.HttpContext.Response.Clear();
                requestContext.HttpContext.Response.ContentType = GetContentType(requestContext.HttpContext.Request.Url.ToString());

                // find physical path to image here.  
                string filepath = requestContext.HttpContext.Server.MapPath("~/test.jpg");

                requestContext.HttpContext.Response.WriteFile(filepath);
                requestContext.HttpContext.Response.End();
            }
            return null;
        }

        private static string GetContentType(String path)
        {
            switch (Path.GetExtension(path))
            {
                case ".bmp": return "Image/bmp";
                case ".gif": return "Image/gif";
                case ".jpg": return "Image/jpeg";
                case ".png": return "Image/png";
                default: break;
            }
            return "";
        }
    }
}

The above IRouteHandler is pretty simple.  Ignoring the GetContentType helper method, there's really only two things happening.  First, we check for a "filename" parameter that got passed in to our handler (more on that in a second).  If it's not there, we return a 404 response.  Otherwise, we attempt to open up the physical file "test.jpg", and stream it to the browser.

Clearly, this should be adapted to your needs by actually using the filename parameter to find the physical files on your system.   But moving on - how do we invoke this from our MVC app?  And how do we pass in the filename parameter, of which we'd like to reroute to some other physical path?

Routing the Request to the Custom Handler

Well, this is the easy part.  Where you'd normally define your routes in Global.asax, simply use routes.Add(), instead of routes.MapRoute().  Just like this:

routes.Add("ImagesRoute",
                 new Route("graphics/{filename}", new ImageRouteHandler()));

This method of adding our route allows us to specify our custom IRouteHandler, rather than routes.MapRoute(), which by default uses an instance of MvcRouteHandler.  So now, we've defined a route that matches against any requested URL containing "graphics/", and puts the rest of the URL into the "filename" bucket of the RouteDataDictionary, and hands it off to our IRouteHandler.  This is how we pass the filename parameter into our custom route handler - basically the same way we pass things into controllers, by defining the variables in the route pattern.

We've successfully routed all URL's containing "graphics/", which doesn't physically exist in our web application, and returning "temp.jpg", which could exist anywhere.  With a bit of coding around the file IO, you could return files from anywhere.

And that's pretty much it!  You might be thinking, "this seems like a lot of extra work just to re-route a URL to a physical file that already existed in my web app!".   If you take a step back though, you'll see the power of this approach.  What if you wanted to log every request to the original URL to a special log file?  What if you wanted to also transform the image before returning it?  Perhaps launch a system executable or asynchronously hit a web service?  What if you wanted to...?

In a nutshell, by inserting your own HttpHandlers into the ASP.Net pipeline to handle routed requests, you can code anything that you'd like to happen when a request comes in, rather than just rewriting it to some other URL.

kick it on DotNetKicks.com

  • Digg
  • del.icio.us
  • DotNetKicks
  • Slashdot
  • StumbleUpon

3 responses so far

Jul 19 2009

Facebook Notifications – "An unknown error occurred (out of memory)"

Published by blake under PHP

Over the last few months, I've worked with the Facebook Notification system many times, and there has always been a moderate-to-high level of frustration with it. It is difficult to test on a development application, because you simply don't have the same number of users as a live application, and the application settings on a dev app are different, which affects your allocation of allowed notifications.

One of the prevalent errors I was getting for a long time was that a notification sent to a large number of users (application-to-user notification) would invariably fail with an Exception thrown from the Facebook PHP client library. My notification controller code in a particular application was set up to catch Exceptions and display the error into the administration control panel. However, "An unknown error occurred (out of memory)" is about as helpful as you might imagine. Yay Facebook. The only thing going for this particular error is that it's an iota more helpful than Facebook's documentation, which you quickly learn to distrust as out-of-date or just plain wrong.

The logic

My notification function logic was fairly simple: Accept an array of Facebook user ids, run an FQL query against Facebook to make sure the users all had the application installed, remove any ids that didn't, then use the Notifications.send() method (via the PHP client library's $facebook->api_client->notifications_send() method) to fire off the message.

No problems on development. On production, it was a giant party of "An unknown error occurred (out of memory)" errors.

At first I thought that Notifications.send() couldn't handle a lot of user ids (I was giving it at least 5000), so I put a loop in to break it up into several small "chunk sends". I went all the way down to 500 users per request, and nothing seemed to change, other than I was probably wasting my allocated notifications. The only silver lining to doing this testing on a production server was that the failed notifications didn't seem to go out to anyone, although that's impossible to tell for sure.

It's not the size of the notification, it's how you verify it.

I struggled with this for a long while until I realized that there was only one other possible location for Facebook to throw this error, and that was with the FQL query. I moved my "chunking" loop to work with the FQL as well, and I finally had relief from the "unknown" error. My conclusion therefore is that FQL queries might not be able to handle a large result set.

Since the documentation and forums couldn't give me any help, I thought I'd post my working code and explanation here, in the hopes that someone might find it useful.

Note that the Facebook object has been previously set as a property of the Controller that is executing this function, so it is accessed via $this->facebook here. You can see how the original array is broken into separate requests of approximately 500 users each, with each chunk of users being verified in turn. I have no idea what a safe number of users would be, however 500 seems to be working for me (for now).

/**
 * Sends a Facebook Notification message to all given users of an application.
 *
 * @param array $facebookIds Array of Facebook IDs to send the notification to.
 * @param string $msg Notification message to send.
 * @return int Count of how many users were sent the message.
 * @link http://wiki.developers.facebook.com/index.php/Notifications.send
 */
public function notifyMultipleUsers($facebookIds, $msg)
{
	if (! is_array($facebookIds) || count($facebookIds) == 0) {
		throw new Exception('No user ids given to notifyMultipleUsers().');
	}
	$reqSize = 500;
	$numUsers = 0;
	while(count($facebookIds) > 0) {
		$tmpIds = array_splice($facebookIds,0,$reqSize);
		//Check against Facebook database to verify which users have the app installed.
		$fql = 'SELECT uid FROM user '.
		       'WHERE is_app_user=1 AND uid IN ('.implode(',',$tmpIds).')';
		$fqlRes = $this->facebook->api_client->fql_query($fql);
		$appUsers = array();
		foreach($fqlRes as $a) {
			$appUsers[] = $a['uid'];
		}
		$c = count($appUsers);
		if ($c > 0) {
			$this->facebook->api_client->notifications_send(
				implode(',',$appUsers),$msg,'app_to_user'
			);
			$numUsers += $c;
		}
	}
	return $numUsers;
}
  • Digg
  • del.icio.us
  • DotNetKicks
  • Slashdot
  • StumbleUpon

One response so far

Apr 25 2009

.Net Mocking Frameworks – Capability Comparison

I have a couple years of experience of TDD under my belt, but it's only recently that I've felt like I am a relatively decent practitioner of it.  I attribute this to forcing myself to take the plunge into mocking, and the knowledge of patterns and loosely-coupled design that I've gained from it.

You see, I work on a pretty large and complex ASP.Net webforms product, and tests were introduced late into the development cycle of the initial release.  We favored integration testing with real data sources over actual unit tests. While I did write some tests, I knew that our product was not very testable by design.

Recently we put together a public facing API for programming against the product, that we were able to build from scratch.  This was a natural opening to apply test driven practices and start building unit tests from the get-go.  Due to the service oriented nature of the data that the product consumes, I soon found myself realizing that I needed mocks in a big way.  I gritted my teeth and dove head-first into Rhino.mocks.

Several weeks down the road, it was obvious that Rhino just isn't right for our environment.  The learning curve is too steep to win quick success with all our developers (and therefore by extension, our project managers).  I began looking for another framework.

This led me on a frustrating research mission to find the differences between frameworks without actually trying them all. Due to a lack of in-depth comparisons of the actual capabilities of the various mocking frameworks, I've ended up putting together this chart.  I'm hoping it helps some people.  My interest is very high in this arena, so I will be keeping it up to date.

(Just for interest's sake - we ended up choosing Moq, due to the ease of the API.   The philosophy of Moq jives perfectly with our relatively fast-paced and efficient development environment.  We don't care about purism - we care about getting it done.)

.Net Mocking Framework Comparison

.Net Mocking Framework Comparison

Other Notes

Rhino

  • Mature, flexible framework
  • Built on Castle DynamicProxy
  • Very large array of "syntaxes" leads to extreme confusion when writing tests - documentation mixed for the new 3.5 fluent syntax
  • Large community of users
Moq

  • New(ish) framework also built on Castle
  • Requires .Net 3.5 due to its lambda heavy syntax
  • No distinction between mocks & stubs, record/playback (joy!!)
  • Responsive and active developers and community
NMock2

  • Uses "magic strings" for mocking - makes tests brittle
  • Confusing product version #'s - Nmock2 is actually a new team that picked up NMock and continued development
TypeMock.Net

  • Powerful framework that uses redirection at the IL level to create mocks
  • Expensive ($450/license)
  • TDD Purists argue that the power of it leads to poorer design

The above observations were gleaned from many web pages, documentation repositories and blog posts, and reflect my limited understanding of each framework.   There may be errors.  If you think I'm wrong, or you can think of other capability aspects that I've overlooked, please let me know!

kick it on DotNetKicks.com

  • Digg
  • del.icio.us
  • DotNetKicks
  • Slashdot
  • StumbleUpon

7 responses so far

Older Entries »