Wednesday, March 30, 2011

Carlitos’ Projects: Speech-Controlled Arduino Robot

We all dream of having appliances and machines that can obey our spoken commands. Well, let’s take the first step towards making this happen. In this second iteration of Carlitos’ Projects, we are going to build a speech-controlled Arduino-based robot.

Speech Controlled Arduino Robot

You may be thinking that making such a robot must be a very complex task. After all, humans take many years before they can understand speech properly. Well, it is not as difficult as you may think and it is definitely lots of fun. The video below illustrates how to make your own speech-controlled Arduino rover.

After watching the video, read below the detailed list of parts and steps required to complete the project.


  • A DFRobotShop Rover kit. It constitutes the robot to be controlled.
  • A VRbot speech recognition module. It processes the speech and identifies the commands.
  • Two Xbee RF communication modules. They create a wireless link between the speech recognition engine and the robot.
  • An Arduino Uno. Controls the speech recognition module.
  • An IO expansion shield. Allows to connect the Xbee module to the DFRobotShop Rover
  • An Xbee shield. Allows to connect an Xbee module to the Arduino Uno.
  • Male headers. They are required by the Xbee shield.
  • A barrel jack to 9V battery adaptor. Allows to power the Arduino Uno trough a 9V battery.
  • An LED. It is not required since the IO expansion shield already has one but it can provide a more visible activity feedback.
  • An audio jack. It will be used to connect the microphone. This is optional
  • A headset or a microphone (a microphone is included with the speech recognition module).


  • A Wire Cutter. It will be used to cut the leads off components.
  • A Soldering Iron. In order to solder all the (many) connections, a soldering station might be preferable since it provides steady and reliable temperature control that allows for easier and safer soldering (you have less risk of burning the components if the temperature is set correctly).
  • A Third Hand. This is not absolutely required, but it is always useful for holding components and parts when soldering.
  • A Hot-glue gun in order to stick the components together.
  • A computer . It programs the DFRobotShop Rover and the Arduino Uno using the Arduino IDE.

Putting it Together

  1. Assemble the DFRobotShop Rover and mount the IO expansion shield, an Xbee Module and the LED. Se the picture above or the video for further information.
  2. Solder the headers onto the Xbee shield. Also solder four headers on the prototyping area as shown below. Do not like soldering? Then keep reading since there is no-solder-required version of the project.
    Speech Engine - 2
  3. Connect the four headers to the corresponding pins as shown below.
    Speech Engine - 3
  4. As shown above, you can also mount the headphone jack and use the cable included with the microphone in order to connect it to the VRbot module microphone input.
  5. Put the shield onto the Arduino and connect the battery.
    Speech Engine - 4
  6. Connect the VRbot speech recognition module wires and the microphone.
    Speech Engine - Back
  7. Program the DFRobotShop Rover and the Arduino Uno with these programs respectively: and
  8. Start talking to your robot! Say “forward”, “backward”, “left”, or “right” in order to make the robot move in the desired direction. The word “move” shown in the video has been removed from the program in order to improve the performance.

More at ...

Friday, March 25, 2011

What's New in iPad 2

Apple has released new iPad 2 as promissed. Following are few features that you will find in new iPad 2.

1. iPad 2 has new design and it is 33% thinner and 15% lighter than the first iPad.

2. iPad 2 runs on a new dual-core A5 processor.

3. iPad 2 has 2 cameras. One is a front-facing VGA camera and the other one is a rear-facing camera that allows users to capture 720p HD videos. The front-facing VGA camera supports Apple’s FaceTime and iPad 2 is the first iPad to get this.

4. Comes with iOS 4.3 (the latest version), promises faster mobile browsing and etc.

5. Has a built-in gyroscope that enables advanced gaming.

iPad 2 - Courtesy of Apple

6. Supports HDMI Video Mirroring, allowing users to stream media on HDTVs.

7. iPad 2’s launch also saw the introduction of 2 new apps; iMovie and GarageBand. In a nutshell, iMovie allows you to shoot and edit videos with your iPad 2 and upload them to video sites. GarageBand on the other hand will make a musician out of you.

In Ireland you can buy it from here...

Wednesday, March 23, 2011

Monirobo Measures Radiation in Japan

According to a report by a Japanese news agency, a radiation monitoring robot, aptly named Monirobo, is the first non-human responder to go on-site following the partial meltdown at the Fukushima Daiichi nuclear power plant. The machine, which was developed by Japan's Nuclear Safety Technology Centre to operate at lethal radiation levels, reportedly began work Friday, enlisting a 3D camera, radiation detector, and heat and humidity sensors to monitor the extent of the damage. A second Monirobo, used to collect samples and detect flammable gases, is expected to join its red counterpart soon -- both robots are operated by remote control from distances up to one kilometer away. They join the US Air Force'sGlobal Hawk drone in unmanned surveillance of the crisis.

Friday, March 11, 2011

Microsoft® Community Contributor Award

Dear Mubshir Raza,

Congratulations again for being recognized with the Microsoft Community Contributor Award!

The Microsoft Community Contributor Award is reserved for participants who have made notable contributions in Microsoft online community forums such as TechNet, MSDN and Answers. The value of these resources is greatly enhanced by participants like you, who voluntarily contribute your time and energy to improve the online community experience for others.

Becoming a Microsoft Community Contributor Award recipient includes access to important benefits, such as complimentary resources to support you in your commitment to Microsoft online communities. To find out more about the Microsoft Community Contributor Award and to claim your recognition, please visit this site:

Thank you for your commitment to Microsoft online technical communities and congratulations again!

Nestor Portillo
Community & Online Support, Microsoft
It was great news. A Big Thanks to Microsoft for the Recognition!
Currently rated 5.0 by 1 people

Sunday, February 13, 2011


The Microsoft Research Audio Video Indexing System (MAVIS) is a set of software components that use speech recognition technology to enable searching of digitized spoken content, whether they are from meetings, conference calls, voice mails, presentations, online lectures, or even Internet video.


As the role of multimedia continues to grow in the enterprise, Government, and the Internet, the need for technologies that better enable discovery and search of such content becomes all the more important.

Microsoft Research has been working in the area of speech recognition for over two decades, and speech-recognition technology is integrated in a number of Microsoft products, such as Windows 7,, Exchange 2010, and Office OneNote. Using integrated speech-recognition technology in the Windows 7 operating system, users can dictate into applications like Microsoft Word, or use speech to interact with their Windows system. The service allows mobile users to get directory services using speech while on the go. Exchange 2010 now provides a rough transcript of incoming voicemails and in Office OneNote, users can search their speech recordings using keywords.

MAVIS Adds to the list of Microsoft applications and services that use speech recognition. MAVIS is designed to enable searching of 100s or even 10,000s of hours of conversational speech with different speakers on different topics. The MAVIS UI, which is a set of aspx pages, resembles that of a web search UI as illustrated below but can be changed to suite different applications.

MAVIS comprises of speech recognition software components that run as a service in the Windows Azure Platform , full text search components that run in SQL Server 2005/2008 sample aspx pages for the UI and some client side PowerShell and .NET tools. The MAVIS client side tools make it easy to submit audio video content to the speech recognition application running in the Azure service using an RSS formatted file and retrieve the results so they can be imported into a SQL Server for full text indexing which enables the audio video content to be searched just like other textual content.

MAVIS is currently a research project with a limited technical preview program. If you have deployed Microsoft SQL Server, have large speech archives and are interested in the MAVIS technical preview program, contact us.

MAVIS Architecture

Speech-Recognition for Audio Indexing Backgrounder

There are two fundamentally different approaches to speech recognition, one referred to as Phonetic indexing and the other Large-vocabulary Continuous Speech Recognition (LVCSR).

  • Phonetic indexing is based on phonetic representations of the pronunciation of the spoken terms and has no notion of words. It performs phonetic based recognition during the indexing process, and at search time, the query is translated into its phonetic spelling which is then matched against the phonetic recognition result. Although this technique has the advantage of not depending on a preconfigured vocabulary, it is not appropriate for searching large audio archives of 10,000s hours because of the high probability of errors using phonetic recognition. It is however appropriate for relatively small amounts of audio as might be the case for searching personal recordings of meetings or lectures. Microsoft has utilized this technique with success to enable the “Audio Search” feature in Office OneNote 2007.
  • Large-vocabulary continuous speech recognition or LVCSR, which is used in MAVIS, turns the audio signals into text using a preconfigured vocabulary and language grammar. The resulting text is then indexed using a text indexer. The LVCSR technique is appropriate for searching large amounts of audio archives which can be 10,000s of hours in length. The vocabulary can be configured to enable recognition of proper nouns such as names of people, places or thing.

Although LVCSR based audio search systems can provide a more accurate search result than phonetic based systems, State-of-the-art LVCSR based speech-recognition accuracy on conversational speech is still not perfect. Researchers at MSR Asia have developed a more accurate technique called “Probabilistic Word-Lattice Indexing” which takes into account how confident the recognition of a word is, as well as what alternate recognition candidates were considered. It also preserves time stamps to allow direct navigation to keyword matches in the audio or video.

Probabilistic Word-Lattice Indexing

For conversational speech, typical speech recognizers can only achieve accuracy of about 60%. To improve the accuracy of speech search, Microsoft Research Asia developed a technique called ”Probabilistic Word-Lattice Indexing,” which helps to improve search accuracy in three ways:

  • Less false negatives Lattices allow to find (sub-)phrase and ‘AND’ matches where individual words are of low confidence, but the fact that they are queried together allows us to infer that they still may be correct. Word-lattices represent alternative recognition candidates that were also considered by the recognizer, but did not turn out to be the top-scoring candidate.
  • Less false positives Lattices also provide a confidence score for each word match. This can be used to suppress low-confidence matches.
  • Time stamps Lattices, unlike text, retain the start times of spoken words, which is useful for navigation.

Word lattices accomplish this by representing the words that may have been spoken in a recording as a graph structure. Experiments show that indexing and searching this lattice structure instead of plain speech-to-text transcripts significantly improves document-retrieval accuracy for multi-word queries (30-60% for phrase queries, and over 200% for AND queries).

For more information about Microsoft Research Asia’s basic lattice method, see:

A challenge in implementing probabilistic word-lattice indexing is the size of the word lattices. Raw lattices as obtained from the recognizer can contain hundreds of alternates for each spoken word. To address this challenge MSRA has devised a technique referred to as Time-based Merging for Indexing which brings down lattice size to about 10 x the size of a corresponding text-transcript index. This is orders of magnitude less than using Raw lattices.

Tuesday, January 11, 2011

Introducing Google Instant

Google Instant is a new search enhancement that shows results as you type. We are pushing the limits of our technology and infrastructure to help you get better search results, faster. Our key technical insight was that people type slowly, but read quickly, typically taking 300 milliseconds between keystrokes, but only 30 milliseconds (a tenth of the time!) to glance at another part of the page. This means that you can scan a results page while you type.
The most obvious change is that you get to the right content much faster than before because you don’t have to finish typing your full search term, or even press “search.” Another shift is that seeing results as you type helps you formulate a better search term by providing instant feedback. You can now adapt your search on the fly until the results match exactly what you want. In time, we may wonder how search ever worked in any other way.


Faster Searches: By predicting your search and showing results before you finish typing, Google Instant can save 2-5 seconds per search.
Smarter Predictions: Even when you don’t know exactly what you’re looking for, predictions help guide your search. The top prediction is shown in grey text directly in the search box, so you can stop typing as soon as you see what you need.
Instant Results: Start typing and results appear right before your eyes. Until now, you had to type a full search term, hit return, and hope for the right results. Now results appear instantly as you type, helping you see where you’re headed, every step of the way

Saturday, January 1, 2011

NodeXL: Network Overview, Discovery and Exploration in Excel

NodeXL is a powerful and easy-to-use interactive network visualisation and analysis tool that leverages the widely available MS Excel application as the platform for representing generic graph data, performing advanced network analysis and visual exploration of networks. The tool supports multiple social network data providers that import graph data (nodes and edge lists) into the Excel spreadsheet.

The tool includes an Excel template for easy manipulation of graph data:

Sample networks generated with NodeXL:

Project contributers:

The NodeXL Template in Action

This is what the NodeXL template looks like. In this example, a simple two-column edge list was entered into the Edges worksheet, and the Show Graph button was clicked to display the network graph in the graph pane on the right.


The two-column edge list is all that’s required, but you can extensively customize the graph’s appearance by filling in a variety of optional edge and vertex columns. Here is the same graph after color, shape, size, image, opacity, and other columns were filled in.


In the next example, NodeXL has imported and displayed a graph of connections among people who follow, reply or mention one another in Twitter:


Some NodeXL Features


NodeXL Graph Gallery

Here are some graphs that were created with NodeXL. Additional images can be found at

John CrowleyCody Dunne

Marc SmithPierre de Vries

Eduarda Mendes RodriguesTony Capone

Tony CaponeEduarda Mendes Rodrigues

Project Trident: A Scientific Workflow Workbench

Project Trident imagery
With Project Trident, you can author workflows visually by using a catalog of existing activities and complete workflows. The workflow workbench provides a tiered library that hides the complexity of different workflow activities and services for ease of use.

Version 1.2 Now Available

Project Trident is available under the Apache 2.0 open source license.

About Project Trident

Built on the Windows Workflow Foundation, this scientific workflow workbench allows users to:

  • Automate analysis and then visualize and explore data
  • Compose, run, and catalog experiments as workflows
  • Capture provenance for each experiment
  • Create a domain-specific workflow library to extend the functionality of the workflow workbench
  • Use existing services, such as provenance and fault tolerance, or add new services
  • Schedule workflows over HPC clusters or cloud computing resources

Current Status: Microsoft Research is working in partnership with oceanographers involved in the NEPTUNE project at the University of Washington and the Monterey Bay Aquarium, to use Project Trident as a scientific workflow workbench. A workflow workbench prototype has been developed for evaluation and is in active use, while an open source version is being implemented for public release.

Learn More

Project Trident in Action—Videos

Papers, Presentations, and Articles

Partner Highlight

  • myExperimentmyExperiment team
    University of Manchester and Southampton University. myExperiment makes it easy to find, use and share scientific workflows and other files, and to build communities. Project Trident uses myExperiment as the community site for sharing workflows, along with provenance traces.

Background: Project Trident

Project Trident: A Scientific Workflow Workbench began as a collaborative scientific and engineering partnership among the University of Washington, the Monterey Bay Aquarium, and Microsoft External Research that is intended to provide Project Neptune with a scientific-workflow workbench for oceanography. Later, Project Trident was deployed at Johns Hopkins University for use in the Pan-STARRS astronomy project.

An increasing number of tools and databases in the sciences are available as Web services. As a result, researchers face not only a data deluge, but also a service deluge, and need a tool to organize, curate, and search for services of value to their research. Project Trident provides a registry that enables the scientist to include services from his or her particular domain. The registry enables a researcher to search on tags, keywords, and annotations to determine which services and workflows­—and even which data sets—are available. Other features of the registry include:

  • Semantic tagging to enable a researcher to find a service based on what it does, or is meant to do, and what it consumes as inputs and produces as outputs
  • Annotations that allow a researcher to understand how to operate the registry and configure it correctly; the registry records when and by whom a service was created, records its version history, and tracks its version

The Project Trident registry service includes a harvester that automatically extracts Web Services Descriptive Language (WSDL) for a service, to allow scientists to use any service as it was presented. Users simply provide the Uniform Resource Identifier (URI) of the service, and the harvester extracts the WSDL and creates an entry in the registry for the service. Curation tools are available to review and semantically describe the service before moving it to the public area of the registry.

Because the end users for Project Trident are scientists rather than seasoned programmers, the workflow workbench also offers a graphical interface that enables the user to visually compose, launch, monitor, and administer workflows.

Project Trident: Logical Architecture

Tridente Logical Architecture