2015-09-09, 14:53
A feature I'm currently working on is content add-ons. These are similar to python plugins, but content is scanned into the library instead of browsed through the add-ons node.
A quick perusal of the Internet Archive ROM Launcher plugin yields a delightful 10,390 games. If the "launching" part is separated from the "browsing" part, this plugin could easily become a content add-on that provides an instant game library, 10,000 games strong.
Unfortunately, the heavy torrenters out there know how slugish the library can be with multiple thousands of items. Clearly a new approach is needed.
Enter the unified content database. This database has been built to handle all library content - movies, tv shows, music, and now games as well. It can scale to tens of thousands of items and can handle binary large objects (BLOBs), so it may someday be able to synchronize fanart and game saves.
The existing libraries are built on SQL databases. The relational model offers several advantages; it is suitable for much of the data stored in the library, and simplifies development by providing strong ACID guarantees. However, when there is a large amount of highly-relational data, JOIN operations across large tables becomes prohibitively expensive.
The unified content database is backed by NoSQL solutions. All of a sudden, fetching an item, even one with a heavy amount of 1:N and N:M relations, becomes O(1). This fact is exploited to give incredibly fast read times, even as the library scales in size. Additionally, NoSQL is schema-less, so painful database updates are no longer required.
Unfortunately, lunch ain't free. The burden is shifted to more complex write operations. This is currently where most of my development is focused. Dropping ACID guarantees has been great for read speed, but those four little letters were actually quite helpful. So now I'm working on restoring the advantages that were lost in moving away from SQL.
When this is closer to being merged with RetroPlayer, I'll return with a more detailed explanation of the internals.
Currently, I've integrated two key-value stores appropriate for embedded use, LevelDB and Kyoto Cabinet. If there's enough interest, backend support could be extended to binary add-ons. That way, thin database clients could be created similar to PVR clients. They would communicate with a network backend, such as CouchDB or MongoDB, allowing for a synchronized database in the same manner as MySQL.
There's another feature I'm working on: unified content scraping. The idea is that the content database exposes a tree-like structure that can be browsed through a content:// url on the VFS. This expands to all content add-ons, as well as all file sources.
Initially, the leafs of this tree are simple URLs with little or no metadata/artwork. That's where Heimdall comes in (remember the GSoC 2012 project by topfs2?) I simply point Heimdall at the root of this tree and let it do its work. Heimdall runs in the background, "pruning" each leaf, filling in all the missing metadata fields/artwork. Under the hood, Heimdall is a simple inference engine using a set of Python rules to extract metadata. It builds on previous information, so any info provided by a content add-on (such as title, platform, etc) or scraped from the file (see PyRomInfo) is used to inform Heimdall's discovery process.
Finally, I come to the inevitable game library. The game library is defined by XML nodes similar to video library nodes. Library data is pulled directly from the unified content database.
Heimdall works in a data-driven manner, meaning that it processes leafs whenever data becomes available. For example, a PyRomInfo rule might discover a unique game ID embedded in the ROM, and Heimdall could then use that unique ID to perform a 1:1 match against a remote database without risking string-matching the filename against a set of similar titles. This is really cool, because PyRomInfo scraping is hundreds of times faster than pinging a remote web server, so Heimdall could decide to process ALL files with a PyRomInfo pass and defer the more expensive web lookups for later. When scraping local roms, this means that a rudimentary library is built within seconds, even for thousands of roms, and slowly embellished as more data becomes available.
A quick perusal of the Internet Archive ROM Launcher plugin yields a delightful 10,390 games. If the "launching" part is separated from the "browsing" part, this plugin could easily become a content add-on that provides an instant game library, 10,000 games strong.
Unfortunately, the heavy torrenters out there know how slugish the library can be with multiple thousands of items. Clearly a new approach is needed.
Enter the unified content database. This database has been built to handle all library content - movies, tv shows, music, and now games as well. It can scale to tens of thousands of items and can handle binary large objects (BLOBs), so it may someday be able to synchronize fanart and game saves.
The existing libraries are built on SQL databases. The relational model offers several advantages; it is suitable for much of the data stored in the library, and simplifies development by providing strong ACID guarantees. However, when there is a large amount of highly-relational data, JOIN operations across large tables becomes prohibitively expensive.
The unified content database is backed by NoSQL solutions. All of a sudden, fetching an item, even one with a heavy amount of 1:N and N:M relations, becomes O(1). This fact is exploited to give incredibly fast read times, even as the library scales in size. Additionally, NoSQL is schema-less, so painful database updates are no longer required.
Unfortunately, lunch ain't free. The burden is shifted to more complex write operations. This is currently where most of my development is focused. Dropping ACID guarantees has been great for read speed, but those four little letters were actually quite helpful. So now I'm working on restoring the advantages that were lost in moving away from SQL.
When this is closer to being merged with RetroPlayer, I'll return with a more detailed explanation of the internals.
Currently, I've integrated two key-value stores appropriate for embedded use, LevelDB and Kyoto Cabinet. If there's enough interest, backend support could be extended to binary add-ons. That way, thin database clients could be created similar to PVR clients. They would communicate with a network backend, such as CouchDB or MongoDB, allowing for a synchronized database in the same manner as MySQL.
There's another feature I'm working on: unified content scraping. The idea is that the content database exposes a tree-like structure that can be browsed through a content:// url on the VFS. This expands to all content add-ons, as well as all file sources.
Initially, the leafs of this tree are simple URLs with little or no metadata/artwork. That's where Heimdall comes in (remember the GSoC 2012 project by topfs2?) I simply point Heimdall at the root of this tree and let it do its work. Heimdall runs in the background, "pruning" each leaf, filling in all the missing metadata fields/artwork. Under the hood, Heimdall is a simple inference engine using a set of Python rules to extract metadata. It builds on previous information, so any info provided by a content add-on (such as title, platform, etc) or scraped from the file (see PyRomInfo) is used to inform Heimdall's discovery process.
Finally, I come to the inevitable game library. The game library is defined by XML nodes similar to video library nodes. Library data is pulled directly from the unified content database.
Heimdall works in a data-driven manner, meaning that it processes leafs whenever data becomes available. For example, a PyRomInfo rule might discover a unique game ID embedded in the ROM, and Heimdall could then use that unique ID to perform a 1:1 match against a remote database without risking string-matching the filename against a set of similar titles. This is really cool, because PyRomInfo scraping is hundreds of times faster than pinging a remote web server, so Heimdall could decide to process ALL files with a PyRomInfo pass and defer the more expensive web lookups for later. When scraping local roms, this means that a rudimentary library is built within seconds, even for thousands of roms, and slowly embellished as more data becomes available.