• 1
  • 2(current)
  • 3
  • 4
  • 5
  • 8
Voice recognition and control?, just basic!
#16
ekim232 Wrote:I see this thread is out dated but I recently got into voice commands and thought I would bring it up with the community.

I have voice commands working perfectly in XBMC with my logitech mic & Dragon Natuaraly Speaking. My remote is almost obsolete. I had to make a custom key map (which was not hard), but I have a few troubles with shared words and phrases that I working out.

I am just wondering if anyone else is stepping into these waters and has a better method or program.

I am happy to share what I know with anyone because it is quite impressive to sit on your couch and say "Info" and movie info pops up. Or "Play" Pause" and all basic commands. "Update Video" will launch my scan. Just saying lauch XBMC will start XBMC flawlessly.

which version do you have?

is there something that can work in linux?
Nvidia Shield with Kodi 18
Reply
#17
rflores2323 Wrote:which version do you have?

is there something that can work in linux?

I am using Dragon Naturally Speaking 9.51 Pro. I am not sure if there is a Linux version or not. I got mine from -

Very easy to setup and map your commands.
Reply
#18
voice recognition might be a cool feature, but you guys are missing something...
Lets say you have your HTPC always on, like i do for the most part, and you're sitting in your living room or wherever and talking to a friend on the phone...your conversation goes something like "hey man, so we were going to go play...." and all of a sudden your tv turns on and starts playing a movie.... or you're talking about how "...it's ok. Next time...." and the cd you were listening to skips to the next track.

See what I'm getting at? There would need to be some serious thought into command structure to prevent accidents like that from happening, or it would be a serious hindrance.
Board: Zotac ION-A-U Case: M350 Mini ITX Memory: 4GB Patriot PC6400 OS: XBMC on OpenELEC.tv build 6936 on a Corsair 32GB SSD Media Storage: W2K8 running on 14TB RAID 5 on an Asrock board w/ AMD Athlon X2 250 and PERC 6/I controller w/ 8 Samsung HD204UI Green drives Time to interface from power switch: 22.4 seconds.
Reply
#19
Evanrich Wrote:voice recognition might be a cool feature, but you guys are missing something...
Lets say you have your HTPC always on, like i do for the most part, and you're sitting in your living room or wherever and talking to a friend on the phone...your conversation goes something like "hey man, so we were going to go play...." and all of a sudden your tv turns on and starts playing a movie.... or you're talking about how "...it's ok. Next time...." and the cd you were listening to skips to the next track.

See what I'm getting at? There would need to be some serious thought into command structure to prevent accidents like that from happening, or it would be a serious hindrance.

That is why you get good software. It trains to the way you speak and picks up on patterns. I leave my mic on all day and I rarely ever have xbmc do a command I don't tell it to. I know not to use "Play" unless it is said soley with nothing to follow. It ignores words that launch or execute programs if they are follow by others.
Reply
#20
you can look at voxforge witch is a free speech corpus and acoustic model repository for open source speech recognition engines http://www.voxforge.org/
Reply
#21
other very simple thing that always works, give you commands a pre fix like " command play "

or " command start xbmc "

a name for you pc works best like gizzi ( you will never use something like this Smile )
Reply
#22
Hey, I don't know if anyone is still watching this thread, but if you are interested in voice control for XBMC (and mediaMonkey, and your x10 home automation etc) check this out.

http://www.voxcommando.com

you can do a lot more than just say pause and play, and there are various modes for dealing with "when you want the computer to listen, and when not".

the demo is using mediaMonkey but I am working on xbmc and it basically works. I just need to expand the command set.

EDIT: here's a more up to date demo of VoxCommando controlling XBMC (skin is Aeon MQ2)

http://www.youtube.com/watch?v=0gYDcandl...re=feedlik
Reply
#23
hmm, doesn't look too platform independant and even less opensource.
I'd prefer something like
http://julius.sourceforge.jp/en_index.php
speech recognition is very time consuming though (at least it was some years back, when I last tried...)
Reply
#24
frostwork Wrote:speech recognition is very time consuming though (at least it was some years back, when I last tried...)

Speech recognition is very demanding. Opensource solutions require lots of training still (commercial applications have gotten much better without training, but would cost a lot to implement).

Voice control on the other hand is much easier. Since the exact word you will be using is already defined, you don't need to train. Basically you have a very short list co commands. It just picks the one that sounds closest to what you said.

"Play" sounds different from "Stop", "Rewind", "Pause", "Fast Forward" etc. So if you mumble a bit, the room is noisy, you speak with an accent or anything like that, it still sounds closer to "Play" than any other known command.

If you tried to use it for jump-lists (jump to movies starting with a certain letter) and you say "C", it could misinterpret that as "B", "D", "E", "G", "P", "T", "V" or "Z" because they all sound similar.

You run into the same issue with Movie Titles. You Say "The God Father II" but it plays "The God Father" before it hears you say "2". There would have to be some minor delay based on the volume in the room, which in a noisy room would be a long delay.

Movie Titles also come in many languages. We would have to pronounce everything phonetically, rather than properly based on the pronunciation of that language.

This stuff can all be done, but like I said, it costs to get those APIs as a redistrubital package, and since XBMC is free, where's that money going to come from?

Basically, the only practical implementation at this point would be simple commands like those found on a standard universal remote. In my mind, it's not worth supporting for such limited functionality. It's got a certain "Cool Factor", but the novelty would quickly wear off.
Reply
#25
It sounds like you are judging the book without even looking at the cover.

I am the developer of VoxCommando Nerd, so obviously I am biased... BUT!

While VoxCommando is not cross platform, (it runs on windows 7 or vista, 32 or 64 bit) it connects to xbmc through a web interface the machine running xbmc can be on any platform.

It is not open source, but it is currently free (I haven't decided what I'll do here yet), and you can customize everything. You can choose what commands you want to use and what words you want to trigger them. It is extremely flexible.

It doesn't have any problem differentiating between god father and god father 2. It knows what media you have in your library. If you say play movie god father and it knows you also have god father 2 in your library, it will wait for a short period before acting, giving you time to say "two". If you only say god father, or even just say God, it can give you a list of movies that match that word(s) and then you can say option 1, or option 2.

I have over 6000 songs in my library and I can ask for *almost* any one of them by song title, artist name or album name. It is true that some foreign language artist names, or movie names can sometimes be tricky, but for me this represents < 1%, and usually I can try again and pronounce it "as an american would". Even if that were a stopping point for you, you can still generate custom playlists etc and ask for them by name. My favourite is being able to rate music when my hands are full.

All of this functionality is currently possible with a correct setup going through EventGhost. I have successfully added the ability to send commands directly to xbmc, but I am still working on the more complicated stuff. If I can find the time it should be good to go in a couple of weeks.

You can run it in different languages too. It uses Microsoft (surprisingly powerful) speech engine, and will operate in whatever language is installed on your version of Windows.

It's also not limited to XBMC. Here's a guy using it in German with WMC. We just got that up and running recently so there are still a few kinks to iron out.

http://www.youtube.com/watch?v=BaGlIwXQM...r_embedded
Reply
#26
and for jump lists you could easily use words instead. "jump to ***" where *** is any word that starts with P would take you to letter P.
Reply
#27
That actually sounds pretty good. Your jump list solution seems like it would work well, but to be honest, I (and most users I think) would naturally state the letter, expecting that as the functional default. Saying part of a word seems cumbersome.

That's the real challenge. Making something that understands the user, rather than having hundreds of defined commands for the user to learn. That was the discrepancy between most free and commercial products I was referring to (I wasn't very clear). Natural speech recognition. When I say "Voice control", I mean the system hears a word and interprets it as a command. When I say "speech recognition", I expect the machine to hear the words and infer a conceptual meaning to them.

For instance, lets say we were watching a movie, you were holding the remote. The phone rings. I say "Hang on a second", you infer "Pause" from that. I could say "wait", "hold up", "pause it", basically any of a million commands that mean the same thing, based on a context you as a person would understand. The machine doesn't know the phone rang, nor does it infer that I would like to pause while answering. Of course it can be taught to recognize those situations, but when getting into every possible situation and verbal permutation, it's not realistic.

That's the real gap, understanding context rather than directive. Some systems out there do a very good job of this, utilizing basic AI. Others just listen for an explicit command.

I know, it's nit-picking. But I believe programs should adapt to the manner in which individual users will utilize them, rather than the user learning to control the system through strict rules.

I'd be interested in trying out your voice control, as you've described it, it sounds very good. If not the Intelligent system of the future I dream of, it's a step in the right direction.

I understand the reluctance to opensource a program. That doesn't have to exclude you from community development however. If a user can customize a command, those customizations could be shared in an online database. If a skin author for instance wanted to create a custom command for a unique function, they could submit it to the database. Pending an editorial review, it would be added. Now if an end user gives a command unknown to their system, it could run a check online for that command online and execute as directed. This would quickly teach it the variable ways of saying one thing, and expand the functionality. You could have a psudo-intelligent system, similar to online chat bots, where it learns from the community and grows to fit their needs without one person grinding away for months trying to anticipate every possible command. That would also keep the program it's self solely in your control, and user submitted commands would essentially be your property, as the editorial control you would naturally exercise to prevent things like duplicate triggers or undesirable effects would also lend to the uniqueness of your program. Of course this depends on how your program works. Can it intemperate text as having a phonetic sound and then listen for that sound, or does it require actual audio for the comparison? Obviously, you wouldn't want to send audio samples over the net to clients, but as text you could have 10s of thousands of commands in less the a MB file and sync up clients with the DB periodically, much like the scrapers in XBMC do.

Another possibility would be to add remote functionality via smartphones like the iPhone or Android. A simple app that sends the command from the phone over TCP to the server running the recognition program would greatly extend functionality.

I'll keep an eye on your program, there are lots of possibilities for it.
Reply
#28
Hi arkryal,

thanks for your thoughtful response. I suspect that it is going to be a long time before computers learn how to be used by us. For the time being, we have to accept the fact that pretty much no matter what you want to do with a computer, you will have to learn to use it, to some extent. Of course some things are easier to learn than others, and my goal (obviously) is to make it as easy as possible. In its current state of development, VoxCommando is not ready for the general public. It is for geeks like me. It was a personal project that no one else was using only a few weeks ago. That said, at its core is a product that is flexible enough to be of great value (I think) to a wide variety of people, already , and many more in the near future.

It is currently possible to create multiple phrases to trigger the same command, and with the use of wildcards and optional elements, (i.e. please at the beginning of a sentence, or an optio "now!" at the end) you can make it easier to remember or guess the commands. I don't believe being able to say anything is practical at this point in time, and even if it were, I think that is where we start to lose control and the computer turn us into batteries!..Shocked [this forum has the best emoticons] It is also the limited command set that virtually eliminates the need to do training ... (though personally I don't see why people are so against doing a bit of training!)

All the commands and the phrases that trigger them are currently stored in xml trees, and it is possible to copy and paste from one tree to another with a text editor. So sharing is already possible. I do plan to first, make it much easier to merge and edit trees from multiple files, by selecting the file and then selecting the branches and nodes of the tree that you want to import, and later to create an online database that users can upload their trees to. Initially we can use the forum to post and download xml files, (hopefully with a meaningful explanation attached).

I think for the foreseeable future, downloading new commands and phrases will always be an opt-in situation, and not automatic. I like the idea though, and perhaps I could create a feature where the user could scan the database for all the commands they are already using, and have optional phrases be suggested that they could easily adopt or reject. it's a very good nugget to put in the idea pouch, but you have to understand that different users might use the exact same word or phrase to do completely different things, and remember this software is not just for xbmc.

Even if the software never goes open source and I decide to start charging for it, I will always encourage the concepts of customization and user cooperation. Otherwise, we might as well just use the built in OS commands. The whole reason I started this project was that I did not want to be told "what I could say", and "what I could do".

cheers. J
Reply
#29
Well, I downloaded and tried it out. I'm surprised at the level of accuracy, even with me mumbling a bit, a movie playing and a fan on, the commands were coming back in the mid 90% range.

I did notice a few minor things. I don't see a method via the options interface to set your XBMC User name and password for the HTTP control. Pause works, but the play next command is stopping the video, and for some reason it is setting the volume to 5% seemingly at random.

It's also using very few system resources compared to other programs. Not bad for people re-purposing old PCs as HTPCs.

Definitely not ready for end-user release just yet, but I underestimated it's potential.

I'll take a closer look when I get home tonight, but excellent work so far.
Reply
#30
glad to hear you had a mostly positive experience.

go to edit the command tree and remove all the stuff you won't be using like itunes, and wmc commands.

Maybe take look at this. It's my first crack at a tutorial, and I will probably give it another go, but since you are totally new to Vox you might find it helpful.

http://voxcommando.com/install01/

(you'll want to skip the part about downloading and unzipping of course... jump forward to time 3:15)

If you know anything about the xbmc http interface then I could probably use your help. I was actually too lazy (or rushed) to figure out the username and password thing, whilst trying to get a proof-of-concept out the door. I added (limited) support for xbmc, itunes and wmc all in the last couple of weeks, but if someone is actually using it, and wants that implemented it, that will motivate me. There is a fellow using it with WMC in German and that motivated me to do a lot of work on WMC.

note that you can add and modify your own xbmc commands if you are familiar with the xbmc web api interface, (which is detailed here: http://wiki.xbmc.org/index.php?title=Web...r_HTTP_API )
then you could probably help me out a lot. Notice on that page that the 2nd example is as follows:
http://xbox/xbmcCmds/xbmcHttp?command=setvolume(80)

to execute that command in VC (voxcommando) is easy, you create a command in the xbmc group and enter this string: setvolume(80)
It's not very practical to create commands for every volume level so we use what I call "payloads" (concept from EventGhost)

anyway take a look at the command tree in VC and compare it to the web api interface to xbmc and you'll get the idea.

btw you mentioned iphone ipods. I do have two methods of using them to send voice commands to VC, but no direct feedback so they don't work well if you aren't actually in the house! Here's one of them...

http://www.youtube.com/watch?v=eTkO3AzYJ...r_embedded

why do I feel like this is me? Angry
Reply
  • 1
  • 2(current)
  • 3
  • 4
  • 5
  • 8

Logout Mark Read Team Forum Stats Members Help
Voice recognition and control?, just basic!2