
Recherche avancée
Autres articles (61)
-
Soumettre améliorations et plugins supplémentaires
10 avril 2011Si vous avez développé une nouvelle extension permettant d’ajouter une ou plusieurs fonctionnalités utiles à MediaSPIP, faites le nous savoir et son intégration dans la distribution officielle sera envisagée.
Vous pouvez utiliser la liste de discussion de développement afin de le faire savoir ou demander de l’aide quant à la réalisation de ce plugin. MediaSPIP étant basé sur SPIP, il est également possible d’utiliser le liste de discussion SPIP-zone de SPIP pour (...) -
Les statuts des instances de mutualisation
13 mars 2010, parPour des raisons de compatibilité générale du plugin de gestion de mutualisations avec les fonctions originales de SPIP, les statuts des instances sont les mêmes que pour tout autre objets (articles...), seuls leurs noms dans l’interface change quelque peu.
Les différents statuts possibles sont : prepa (demandé) qui correspond à une instance demandée par un utilisateur. Si le site a déjà été créé par le passé, il est passé en mode désactivé. publie (validé) qui correspond à une instance validée par un (...) -
Les autorisations surchargées par les plugins
27 avril 2010, parMediaspip core
autoriser_auteur_modifier() afin que les visiteurs soient capables de modifier leurs informations sur la page d’auteurs
Sur d’autres sites (7198)
-
What is Behavioural Segmentation and Why is it Important ?
28 septembre 2023, par Erin — Analytics Tips -
8 Best Tools to Analyse Website Traffic
12 septembre 2023, par Erin — Analytics Tips, Marketing -
Overthinking My Search Engine Problem
31 décembre 2013, par Multimedia Mike — GeneralI wrote a search engine for my Game Music Appreciation website, because the site would have been significantly less valuable without it (and I would eventually realize that the search feature is probably the most valuable part of this endeavor). I came up with a search solution that was a bit sketchy, but worked… until it didn’t. I thought of a fix but still searched for more robust and modern solutions (where ‘modern’ is defined as something that doesn’t require compiling a C program into a static CGI script and hoping that it works on a server I can’t debug on).
Finally, I realized that I was overthinking the problem– did you know that a bunch of relational database management systems (RDBMSs) support full text search (FTS) ? Okay, maybe you did, but I didn’t know this.
Problem Statement
My goal is to enable users to search the metadata (title, composer, copyright, other tags) attached to various games. To do this, I want to index a series of contrived documents that describe the metadata. 2 examples of these contrived documents, interesting because both of these games have very different titles depending on region, something the search engine needs to account for :system : Nintendo NES game : Snoopy’s Silly Sports Spectacular author : None ; copyright : 1988 Kemco ; dumped by : None additional tags : Donald Duck.nsf Donald Duck
system : Super Nintendo
game : Arcana
author : Jun Ishikawa, Hirokazu Ando ; copyright : 1992 HAL Laboratory ; dumped by : Datschge
additional tags : card.rsn.gamemusic Card Master CardmasterThe index needs to map these documents to various pieces of game music and the search solution needs to efficiently search these documents and find the various game music entries that match a user’s request.
Now that I’ve been looking at it for long enough, I’m able to express the problem surprisingly succinctly. If I had understood that much originally, this probably would have been simpler.
First Solution & Breakage
My original solution was based on SWISH-E. The CGI script was a C program that statically linked the SWISH-E library into a binary that miraculously ran on my web provider. At least, it ran until it decided to stop working a month ago when I added a new feature unrelated to search. It was a very bizarre problem, the details of which would probably bore you to tears. But if you care, the details are all there in the Stack Overflow question I asked on the matter.While no one could think of a direct answer to the problem, I eventually thought of a roundabout fix. The problem seemed to pertain to the static linking. Since I couldn’t count on the relevant SWISH-E library to be on my host’s system, I uploaded the shared library to the same directory as the CGI script and used dlopen()/dlsym() to fetch the functions I needed. It worked again, but I didn’t know for how long.
Searching For A Hosted Solution
I know that anything is possible in this day and age ; while my web host is fairly limited, there are lots of solutions for things like this and you can deploy any technology you want, and for reasonable prices. I figured that there must be a hosted solution out there.I have long wanted a compelling reason to really dive into Amazon Web Services (AWS) and this sounded like a good opportunity. After all, my script works well enough ; if I could just find a simple Linux box out there where I could install the SWISH-E library and compile the CGI script, I should be good to go. AWS has a free tier and I started investigating this approach. But it seems like a rabbit hole with a lot of moving pieces necessary for such a simple task.
I had heard that AWS had something in this area. Sure enough, it’s called CloudSearch. However, I’m somewhat discouraged by the fact that it would cost me around $75 per month to run the smallest type of search instance which is at the core of the service.
Finally, I came to another platform called Heroku. It’s supposed to be super-scalable while having a free tier for hobbyists. I started investigating FTS on Heroku and found this article which recommends using the FTS capabilities of their standard hosted PostgreSQL solution. However, the free tier of Postgres hosting only allows for 10,000 rows of data. Right now, my database has about 5400 rows. I expect it to easily overflow the 10,000 limit as soon as I incorporate the C64 SID music corpus.
However, this Postgres approach planted a seed.
RDBMS Revelation
I have 2 RDBMSs available on my hosting plan– MySQL and SQLite (the former is a separate service while SQLite is built into PHP). I quickly learned that both have FTS capabilities. Since I like using SQLite so much, I elected to leverage its FTS functionality. And it’s just this simple :CREATE VIRTUAL TABLE gamemusic_metadata_fts USING fts3 ( content TEXT, game_id INT, title TEXT ) ;
SELECT id, title FROM gamemusic_metadata_fts WHERE content MATCH "arcana" ;
479|ArcanaThe ‘content’ column gets the metadata pseudo-documents. The SQL gets wrapped up in a little PHP so that it queries this small database and turns the result into JSON. The script is then ready as a drop-in replacement for the previous script.