Jump to content


These Forums Are Now Read-Only


For TubePress support, please post a question here or open a support ticket and we will be glad to assist.


Photo

Accents and special characters in search


  • Please log in to reply
11 replies to this topic

#1 Chris Guillou

Chris Guillou

    Member

  • Members
  • PipPip
  • 10 posts

Posted 25 July 2012 - 12:08 PM

What do you know ? It's me again !

I've come across a problem with the interactive search fields as you can see here : http://www.ville-sev...videotheque.php If you search for words such as "Sèvres" the characters are encoded and search fails.

I've tried using "utf8_decode" but it doesn't seem to work, has someone had to use the search function with this type of alphabet by any chance ?

Cheers,

Chris

#2 eric

eric

    Lead Developer

  • TubePress Staff
  • 2787 posts

Posted 28 July 2012 - 11:02 AM

Bonjour Chris,

That's odd. When I try the same shortcodes from tubepress.org, the characters do not get encoded: http://tubepress.com/cguillou/. The only important difference I can see between our sites is that yours is (appropriately) served up via ISO-8895-1 character encoding, while tubepress.org uses UTF-8. But TubePress shouldn't care what encoding you use to display pages, so this could be a red herring.

Warning!The next few paragraphs are pretty geeky. I'm going in to detail to both 1) make things clear in my head and 2) have a reference for anyone else with this issue. :)

I looked at your debug output and compared it to my debug output. Both sites are able to interpret the "è" - both debug outputs show

Accepted valid value: tagValue = Sèvres
But the difference arises when TubePress makes its request to YouTube to return the search results. Your debug output shows a URL of

http://gdata.youtube.com/feeds/api/videos?q=S%E8vres&author=mairiesevres&v=2&start-index=1&max-results=6&safeSearch=moderate&format=5
and mine shows

http://gdata.youtube.com/feeds/api/videos?q=S%C3%A8vres&author=mairiesevres&v=2&start-index=1&max-results=6&safeSearch=moderate&format=5
Notice that tubepress.org encoded the "è" as "%C3%A8", while your site encoded it as "S%E8". Digging in to the RFC on URL format I read that

When a new URI scheme defines a component that represents textual
data consisting of characters from the Universal Character Set [UCS],
the data should first be encoded as octets according to the UTF-8
character encoding [STD63]; then only those octets that do not
correspond to characters in the unreserved set should be percent-
encoded. For example, the character A would be represented as "A",
the character LATIN CAPITAL LETTER A WITH GRAVE would be represented
as "%C3%80", and the character KATAKANA LETTER A would be represented
as "%E3%82%A2".


I believe this explains things. Your server accepted the "è" under ISO-8895-1 encoding, but TubePress failed to re-encode it as UTF-8 before running it through urlencode(). tubepress.org was already using UTF-8, so it didn't exhibit the problem.

In short, I think this is a TubePress bug. What happens if we change sys/classes/org/tubepress/api/url/Url.class.php lines 392 and 396 from

$name = urlencode($name);    if (! is_null($value)) {        $parts[] = $name . '=' . urlencode($value);
to

$name = urlencode(utf8_encode($name));    if (! is_null($value)) {        $parts[] = $name . '=' . urlencode(utf8_encode($value));
I know that you said you tried utf8_encode() already, but not sure if you tried it in these two spots? With luck, that will solve the issue entirely. If it doesn't, then I'm definitely interested in setting up a test site to resolve this issue. Let me know either way.

Thanks! And sorry for the rambling post..

#3 Chris Guillou

Chris Guillou

    Member

  • Members
  • PipPip
  • 10 posts

Posted 29 July 2012 - 11:14 AM

Wow ! Thanks for the effort put into the answer Eric .
I was afraid this was going to have something to do with the ISO encoding of our pages...

Unfortunately after having modified the lines you mentioned in sys/classes/org/tubepress/api/url/Url.class.php the problem remains :-(

Just to double-check with you here is the code used to call up the search and results :

<div class="tubepress-search">
<?php
print  utf8_decode(TubePressPro::getHtmlForShortcode('output="searchInput" searchProvider="youtube" searchResultsUrl="bloc_txt1"'));
?>

</div>

<style>
.tubepress_thumb {
	height: 220px;
	line-height: 1.1em;
	font-size: 11px;
}

.tubepress_embedded_title {
font-size: 22px;
margin-top: .3em;
}	

</style>


<div id="resultsDiv">
<H2>Vos résultats</H2>
<?php
print utf8_decode(TubePressPro::getHtmlForShortcode('output="searchResults" searchResultsOnly="true" searchResultsRestrictedToUser="mairiesevres" resultsPerPage="bloc_txt2" paginationAbove="false" paginationBelow="true" thumbWidth="190" thumbHeight="145" playerLocation="shadowbox"'));
?>
</div>

Something strange I thought I might mention is that even if I call up the search URL this way : http://www.ville-sev...v/...rch=sevres I do ...


Important Edit : Just found out that I actually have the same problem with Tag filtering, this page should be filtering on "sèvres" : http://www.ville-sev...v/...debug=true and as you can see it doesn't work (If I try a non-accent tag it works fine..)
Here's the code for the gallery :

<style>
.tubepress_thumb {
	height: 200px;
	line-height: 1.1em;
	font-size: 11px;
}

.tubepress_embedded_title {
font-size: 22px;
margin-top: .3em;
}	

</style>


<div class="tubepress-gallerie">
<?php 

print utf8_decode(TubePressPro::getHtmlForShortcode('mode="tag" searchResultsRestrictedToUser="mairiesevres" tagValue="bloc_txt2" orderBy="newest" resultsPerPage="bloc_txt1" paginationAbove="false" paginationBelow="true" ajaxPagination="true" thumbWidth="190" thumbHeight="145"'));

?>

</div>

It really seems to break it all up since the number of videos per page or Ajax pagination don't work either with "Sèvres" (here's a working test : http://www.ville-sev...t/...eo_tag.php )



Thanks again for your help

#4 eric

eric

    Lead Developer

  • TubePress Staff
  • 2787 posts

Posted 01 August 2012 - 09:20 PM

Could you post and/or send me the output of phpinfo()? I'm trying to reproduce this issue in my test environment but can't seem to get the correct PHP configuration.

Unfortunately after having modified the lines you mentioned in sys/classes/org/tubepress/api/url/Url.class.php the problem remains :-(


Darn. Well thanks for trying it anyway. Feel free to revert to the original code, since according to the debug output that actually made the problem much worse!

Just found out that I actually have the same problem with Tag filtering


This makes sense. Internally, interactive search and "tag-based" galleries use the same code.

We'll figure this out!

#5 Chris Guillou

Chris Guillou

    Member

  • Members
  • PipPip
  • 10 posts

Posted 02 August 2012 - 02:57 AM

Thanks Eric,

Url.class.php is back in its original form and here is the phpinfo : http://www.ville-sev...fr/tp//info.php

We will prevail !

Chris

#6 eric

eric

    Lead Developer

  • TubePress Staff
  • 2787 posts

Posted 04 August 2012 - 07:39 PM

I was finally able to reproduce the issue and I think I have a fix (at least, a fix that works for me):

  • Open up sys/ui/themes/default/search/search_input.tpl.php with your favorite text editor
  • Edit line 25. Change it from

    <form method="get" action="<span class="syntaxdefault"><?php echo ${org_tubepress_api_const_template_Variable::SEARCH_HANDLER_URL}; ?>"></span>
    to

    <form accept-charset="utf-8" method="get" action="<span class="syntaxdefault"><?php echo ${org_tubepress_api_const_template_Variable::SEARCH_HANDLER_URL}; ?>"></span>
In my testing, anyway, this forced the form to encode the GET params as UTF-8 before sending it off to the server. Give that a try? I'm feeling cautiously optimistic...

#7 Chris Guillou

Chris Guillou

    Member

  • Members
  • PipPip
  • 10 posts

Posted 06 August 2012 - 03:03 AM

Welll aaaaaaaalrighty then.

We've got some good news : http://www.ville-sev...ages/v/...èvres works just fine ! :-)

We've got bad news : http://www.ville-sev...v/...alerie.php which is a tag filtered gallery on "sèvres" doesn't :-(

And then it might be worth noting that Ajax Pagination dispaly gets messed up following first clic : http://www.ville-sev...t/...eo_tag.php (No filtering apart from user)


Thanks Eric.

#8 eric

eric

    Lead Developer

  • TubePress Staff
  • 2787 posts

Posted 06 August 2012 - 12:58 PM

OK! I'm happy with baby steps, as long as we're making progress. I've committed this first fix so it will be released with the next version of TubePress.

We've got bad news : http://www.ville-sev...fr/ewb_pages/v/ ... alerie.php which is a tag filtered gallery on "sèvres" doesn't :-(


Hmm, I'm having trouble (again) reproducing this error. My test server's PHP configuration is identical to yours, but when I run the same shortcode, I get a correctly tag-filtered gallery. It seems that TubePress is not correctly parsing the shortcode on your server. To help me test, could you create a new PHP file somewhere on your server, maybe call it tubepress-test.php, with the following contents:

<?php$pattern = '/(\w+)\s*=\s*"([^"]*)"(?:\s*,)?(?:\s|$)|(\w+)\s*=\s*\'([^\']*)\'(?:\s*,)?(?:\s|$)|(\w+)\s*=\s*([^\s\'"]+)(?:\s*,)?(?:\s|$)/';$text = 'mode="tag" searchResultsRestrictedToUser="mairiesevres" tagValue="sèvres" orderBy="newest" resultsPerPage="8" paginationAbove="false" paginationBelow="true" thumbWidth="190" thumbHeight="145" playerLocation="shadowbox"          description="true" descriptionLimit="125"';$text = preg_replace('/[\x{00a0}\x{200b}]+/u', ' ', $text);$text = str_replace(array('‘', '’', '′'), '\'', $text);$text = str_replace(array('"', '“', '”', '″'), '"', $text);$result = preg_match_all($pattern, $text, $match, PREG_SET_ORDER);var_dump($pattern);echo "\n";var_dump($text);echo "\n";var_dump($result);echo "\n";var_dump($match);
This is a collection of snippets from TubePress's shortcode parser. I'm hoping it will give me some insight into what's happening on your server.

And then it might be worth noting that Ajax Pagination dispaly gets messed up following first clic : http://www.ville-sev...fr/ewb_pages/t/ ... eo_tag.php (No filtering apart from user)


Let's finish up this other problem, then we'll come back to this one. Sound good?

Thanks again for your patience with this!

#9 Chris Guillou

Chris Guillou

    Member

  • Members
  • PipPip
  • 10 posts

Posted 06 August 2012 - 01:19 PM

Eric

Is missing in your code ?

Inserted here :

http://www.ville-sev.../tp/test_tp.php
http://www.ville-sevres.fr/tp/tp.php


Thanks

Chris

#10 eric

eric

    Lead Developer

  • TubePress Staff
  • 2787 posts

Posted 07 August 2012 - 04:43 PM

Hi Chris,

I was able to reproduce the issue and think I have a fix (again, at least that works for me). The cause is that when the PHP source file is encoded in something other than UTF-8, TubePress doesn't correctly parse the shortcode. Here's how to fix this:

  • Open up sys/classes/org/tubepress/impl/shortcode/SimpleShortcodeParser.class.php with your favorite text editor
  • Change line 82 from

    $text    = preg_replace('/[\x{00a0}\x{200b}]+/u', ' ', $matches[1]);
    to

    $text    = preg_replace('/[\x{00a0}\x{200b}]+/u', ' ', utf8_encode($matches[1]));
Fingers crossed, that should fix tag-based gallery. Give it a try and let me know? Thanks!

#11 Chris Guillou

Chris Guillou

    Member

  • Members
  • PipPip
  • 10 posts

Posted 08 August 2012 - 02:50 AM

As we say over here : "Champagne pour tout le monde !" =D

So Search and Tag filtering look fine, I'll do a little more pushing and pulling here and there to make sure.
Ajax pagination is out of order (I deactivated it on the previous example in case you try it) but I can live with that as long my client doesn't learn about that great feature ;-)

Two little things :

• Although I initiate French language support (and get it for thumbnail information) the Search button label isn't translated http://www.ville-sev...ages/v/... fête ?
• the line you asked me to change was actually on line 88 not 82, I suppose that's because I haven't updated my version of Tubepress since 2.4.2 ?

So my next question is : (apart from eventually getting Ajax Pagination of course ;-) how long should I wait before updating my TP Library with an original release that will ( ? ) include all these modifications ?

I hope all this will serve other non-UTF8 encoders out there and I've said it before but I'll say it again : Merci beaucoup !

Chris

#12 eric

eric

    Lead Developer

  • TubePress Staff
  • 2787 posts

Posted 08 August 2012 - 03:37 PM

Great! We're on a roll.

Although I initiate French language support (and get it for thumbnail information) the Search button label isn't translated http://www.ville-sev...fr/ewb_pages/v/ ... +f%C3%AAte ?


The problem here is simply that the French translation file is incomplete, so TubePress defaults back to English for strings that have no translation. Here is the line in question. I see that you've managed to change the button text on your own anyway, so maybe this isn't really relevant to you.

• the line you asked me to change was actually on line 88 not 82, I suppose that's because I haven't updated my version of Tubepress since 2.4.2 ?


I just confirmed that in TubePress 2.4.2 the correct line is 88 - so that makes sense. Sorry for the confusion.

So my next question is : (apart from eventually getting Ajax Pagination of course ;-) how long should I wait before updating my TP Library with an original release that will ( ? ) include all these modifications ?


I've committed the changes we've made so far on this thread (here and here), so you can expect TubePress 2.5.0 (which is the next version that will be released) will contain these fixes. I'm also going to take a swing at Ajax pagination since I have a good test environment set up. Stand by for updates on that.

I hope all this will serve other non-UTF8 encoders out there and I've said it before but I'll say it again : Merci beaucoup !


The pleasure is mine! Could not have solved these bugs without your help. Thank you!!