Sunday, February 13, 2011


The Microsoft Research Audio Video Indexing System (MAVIS) is a set of software components that use speech recognition technology to enable searching of digitized spoken content, whether they are from meetings, conference calls, voice mails, presentations, online lectures, or even Internet video.


As the role of multimedia continues to grow in the enterprise, Government, and the Internet, the need for technologies that better enable discovery and search of such content becomes all the more important.

Microsoft Research has been working in the area of speech recognition for over two decades, and speech-recognition technology is integrated in a number of Microsoft products, such as Windows 7,, Exchange 2010, and Office OneNote. Using integrated speech-recognition technology in the Windows 7 operating system, users can dictate into applications like Microsoft Word, or use speech to interact with their Windows system. The service allows mobile users to get directory services using speech while on the go. Exchange 2010 now provides a rough transcript of incoming voicemails and in Office OneNote, users can search their speech recordings using keywords.

MAVIS Adds to the list of Microsoft applications and services that use speech recognition. MAVIS is designed to enable searching of 100s or even 10,000s of hours of conversational speech with different speakers on different topics. The MAVIS UI, which is a set of aspx pages, resembles that of a web search UI as illustrated below but can be changed to suite different applications.

MAVIS comprises of speech recognition software components that run as a service in the Windows Azure Platform , full text search components that run in SQL Server 2005/2008 sample aspx pages for the UI and some client side PowerShell and .NET tools. The MAVIS client side tools make it easy to submit audio video content to the speech recognition application running in the Azure service using an RSS formatted file and retrieve the results so they can be imported into a SQL Server for full text indexing which enables the audio video content to be searched just like other textual content.

MAVIS is currently a research project with a limited technical preview program. If you have deployed Microsoft SQL Server, have large speech archives and are interested in the MAVIS technical preview program, contact us.

MAVIS Architecture

Speech-Recognition for Audio Indexing Backgrounder

There are two fundamentally different approaches to speech recognition, one referred to as Phonetic indexing and the other Large-vocabulary Continuous Speech Recognition (LVCSR).

  • Phonetic indexing is based on phonetic representations of the pronunciation of the spoken terms and has no notion of words. It performs phonetic based recognition during the indexing process, and at search time, the query is translated into its phonetic spelling which is then matched against the phonetic recognition result. Although this technique has the advantage of not depending on a preconfigured vocabulary, it is not appropriate for searching large audio archives of 10,000s hours because of the high probability of errors using phonetic recognition. It is however appropriate for relatively small amounts of audio as might be the case for searching personal recordings of meetings or lectures. Microsoft has utilized this technique with success to enable the “Audio Search” feature in Office OneNote 2007.
  • Large-vocabulary continuous speech recognition or LVCSR, which is used in MAVIS, turns the audio signals into text using a preconfigured vocabulary and language grammar. The resulting text is then indexed using a text indexer. The LVCSR technique is appropriate for searching large amounts of audio archives which can be 10,000s of hours in length. The vocabulary can be configured to enable recognition of proper nouns such as names of people, places or thing.

Although LVCSR based audio search systems can provide a more accurate search result than phonetic based systems, State-of-the-art LVCSR based speech-recognition accuracy on conversational speech is still not perfect. Researchers at MSR Asia have developed a more accurate technique called “Probabilistic Word-Lattice Indexing” which takes into account how confident the recognition of a word is, as well as what alternate recognition candidates were considered. It also preserves time stamps to allow direct navigation to keyword matches in the audio or video.

Probabilistic Word-Lattice Indexing

For conversational speech, typical speech recognizers can only achieve accuracy of about 60%. To improve the accuracy of speech search, Microsoft Research Asia developed a technique called ”Probabilistic Word-Lattice Indexing,” which helps to improve search accuracy in three ways:

  • Less false negatives Lattices allow to find (sub-)phrase and ‘AND’ matches where individual words are of low confidence, but the fact that they are queried together allows us to infer that they still may be correct. Word-lattices represent alternative recognition candidates that were also considered by the recognizer, but did not turn out to be the top-scoring candidate.
  • Less false positives Lattices also provide a confidence score for each word match. This can be used to suppress low-confidence matches.
  • Time stamps Lattices, unlike text, retain the start times of spoken words, which is useful for navigation.

Word lattices accomplish this by representing the words that may have been spoken in a recording as a graph structure. Experiments show that indexing and searching this lattice structure instead of plain speech-to-text transcripts significantly improves document-retrieval accuracy for multi-word queries (30-60% for phrase queries, and over 200% for AND queries).

For more information about Microsoft Research Asia’s basic lattice method, see:

A challenge in implementing probabilistic word-lattice indexing is the size of the word lattices. Raw lattices as obtained from the recognizer can contain hundreds of alternates for each spoken word. To address this challenge MSRA has devised a technique referred to as Time-based Merging for Indexing which brings down lattice size to about 10 x the size of a corresponding text-transcript index. This is orders of magnitude less than using Raw lattices.

Tuesday, January 11, 2011

Introducing Google Instant

Google Instant is a new search enhancement that shows results as you type. We are pushing the limits of our technology and infrastructure to help you get better search results, faster. Our key technical insight was that people type slowly, but read quickly, typically taking 300 milliseconds between keystrokes, but only 30 milliseconds (a tenth of the time!) to glance at another part of the page. This means that you can scan a results page while you type.
The most obvious change is that you get to the right content much faster than before because you don’t have to finish typing your full search term, or even press “search.” Another shift is that seeing results as you type helps you formulate a better search term by providing instant feedback. You can now adapt your search on the fly until the results match exactly what you want. In time, we may wonder how search ever worked in any other way.


Faster Searches: By predicting your search and showing results before you finish typing, Google Instant can save 2-5 seconds per search.
Smarter Predictions: Even when you don’t know exactly what you’re looking for, predictions help guide your search. The top prediction is shown in grey text directly in the search box, so you can stop typing as soon as you see what you need.
Instant Results: Start typing and results appear right before your eyes. Until now, you had to type a full search term, hit return, and hope for the right results. Now results appear instantly as you type, helping you see where you’re headed, every step of the way

Saturday, January 1, 2011

NodeXL: Network Overview, Discovery and Exploration in Excel

NodeXL is a powerful and easy-to-use interactive network visualisation and analysis tool that leverages the widely available MS Excel application as the platform for representing generic graph data, performing advanced network analysis and visual exploration of networks. The tool supports multiple social network data providers that import graph data (nodes and edge lists) into the Excel spreadsheet.

The tool includes an Excel template for easy manipulation of graph data:

Sample networks generated with NodeXL:

Project contributers:

The NodeXL Template in Action

This is what the NodeXL template looks like. In this example, a simple two-column edge list was entered into the Edges worksheet, and the Show Graph button was clicked to display the network graph in the graph pane on the right.


The two-column edge list is all that’s required, but you can extensively customize the graph’s appearance by filling in a variety of optional edge and vertex columns. Here is the same graph after color, shape, size, image, opacity, and other columns were filled in.


In the next example, NodeXL has imported and displayed a graph of connections among people who follow, reply or mention one another in Twitter:


Some NodeXL Features


NodeXL Graph Gallery

Here are some graphs that were created with NodeXL. Additional images can be found at

John CrowleyCody Dunne

Marc SmithPierre de Vries

Eduarda Mendes RodriguesTony Capone

Tony CaponeEduarda Mendes Rodrigues

Project Trident: A Scientific Workflow Workbench

Project Trident imagery
With Project Trident, you can author workflows visually by using a catalog of existing activities and complete workflows. The workflow workbench provides a tiered library that hides the complexity of different workflow activities and services for ease of use.

Version 1.2 Now Available

Project Trident is available under the Apache 2.0 open source license.

About Project Trident

Built on the Windows Workflow Foundation, this scientific workflow workbench allows users to:

  • Automate analysis and then visualize and explore data
  • Compose, run, and catalog experiments as workflows
  • Capture provenance for each experiment
  • Create a domain-specific workflow library to extend the functionality of the workflow workbench
  • Use existing services, such as provenance and fault tolerance, or add new services
  • Schedule workflows over HPC clusters or cloud computing resources

Current Status: Microsoft Research is working in partnership with oceanographers involved in the NEPTUNE project at the University of Washington and the Monterey Bay Aquarium, to use Project Trident as a scientific workflow workbench. A workflow workbench prototype has been developed for evaluation and is in active use, while an open source version is being implemented for public release.

Learn More

Project Trident in Action—Videos

Papers, Presentations, and Articles

Partner Highlight

  • myExperimentmyExperiment team
    University of Manchester and Southampton University. myExperiment makes it easy to find, use and share scientific workflows and other files, and to build communities. Project Trident uses myExperiment as the community site for sharing workflows, along with provenance traces.

Background: Project Trident

Project Trident: A Scientific Workflow Workbench began as a collaborative scientific and engineering partnership among the University of Washington, the Monterey Bay Aquarium, and Microsoft External Research that is intended to provide Project Neptune with a scientific-workflow workbench for oceanography. Later, Project Trident was deployed at Johns Hopkins University for use in the Pan-STARRS astronomy project.

An increasing number of tools and databases in the sciences are available as Web services. As a result, researchers face not only a data deluge, but also a service deluge, and need a tool to organize, curate, and search for services of value to their research. Project Trident provides a registry that enables the scientist to include services from his or her particular domain. The registry enables a researcher to search on tags, keywords, and annotations to determine which services and workflows­—and even which data sets—are available. Other features of the registry include:

  • Semantic tagging to enable a researcher to find a service based on what it does, or is meant to do, and what it consumes as inputs and produces as outputs
  • Annotations that allow a researcher to understand how to operate the registry and configure it correctly; the registry records when and by whom a service was created, records its version history, and tracks its version

The Project Trident registry service includes a harvester that automatically extracts Web Services Descriptive Language (WSDL) for a service, to allow scientists to use any service as it was presented. Users simply provide the Uniform Resource Identifier (URI) of the service, and the harvester extracts the WSDL and creates an entry in the registry for the service. Curation tools are available to review and semantically describe the service before moving it to the public area of the registry.

Because the end users for Project Trident are scientists rather than seasoned programmers, the workflow workbench also offers a graphical interface that enables the user to visually compose, launch, monitor, and administer workflows.

Project Trident: Logical Architecture

Tridente Logical Architecture

Saturday, December 25, 2010

Fed up with blue screen error at least you can change color

Seeing a bluescreen that’s not blue is disconcerting, even for me, and based on the reaction of the TechEd audiences, I bet you’ll have fun generating ones of a color you pick and showing them off to your techy friends. I first saw Dan Pearson do this in a crash dump troubleshooting talk he delivered with Dave Solomon a couple of years ago and now close my Case of the Unexplained presentations with a bluescreen of the color the audience choses (you can hear the audience’s response at the end of this recording, for example). Note that the steps I’m gong to share for changing the color of the bluescreen are manual and only survive a boot session, so are suitable for demonstrations, not for general bluescreen customization. Be sure to check out the special holiday bluescreen I’ve prepared for you at the end of the post.

Preparing the System

Because you’re going to modify kernel code, the first step is to enable the ability to edit kernel code in memory if it’s not already enabled. Windows systems with less than 2 GB of RAM uses 4KB pages to store kernel code, so can protect pages with the protection most suitable for the contents they contain. For instance, kernel data pages should allow both read and write access while kernel code should only allow read and execute access. As an optimization that helps improve the speed of virtual address translations, Windows uses large pages (4 MB on x86 and x64) on larger systems. That means that if there’s both code and data stored in a page, the page must allow read, write and execute accesses, so to ensure that you can edit a page, you have to encourage Windows to use large pages. If your system is Windows XP or Server 2003 and has less than 256 MB, or is Windows Vista or higher and has 2 GB or less of RAM, create a REG_DWORD value called LargePageMinimum that’s set to 1 under HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management:


So that you don’t have to rush to show off your handiwork before Windows automatically reboots after the crash, change the auto-reboot setting. On Windows XP and Server 2003, right-click on My Computer, select the Advanced Tab, and press the Settings button in the “Startup and Recovery” section. On Windows Vista and higher, right-click on Computer in the Start Menu, select properties to open the Properties dialog, click Advanced System Settings, select the Advanced tab and press the Settings button in the “Startup and Recovery” section. Finally, uncheck the “Automatically restart” checkbox:


If you’re running 64-bit Windows Vista or higher, you need to boot the system in Debug mode so that you can run the kernel debugger in “local” mode. You can do that either by selecting F8 during the system boot and choosing the Debug boot or by checking the Debug checkbox in the System Configuration (Msconfig) utility:


Next, reboot the system and start the debugger with administrator rights (if UAC is on, run it as administrator). Point the debugger at the Microsoft symbol server by opening the Symbol Search Path dialog under the File menu and enter this string: srv*c:\symbols* (replace c:\symbols with whatever local directory in which you want the debugger to store cached symbols). Next, open the Kernel Debugging dialog from the File menu, click the Local page, and press OK:


The subsequent steps vary depending on whether you’re running 32-bit or 64-bit Windows and whether it’s Windows Vista or newer.

32-bit Windows XP and Windows Server 2003

The function that displays the bluescreen on these operating systems is KeBugCheck2. You’re looking for the place where the function passes the color value to the function that fills the screen background, InbvSolidColorFill. Enter the command “u kebugcheck2” to list the start of the function, then enter the “u” command to dump additional pages of the function’s code until you see the reference to InbvSolidColorFill (after entering “u” once, you can just press enter to repeat the command). You’ll need to dump 30-40 pages before you come across the one with the call:


Preceding the call, you’ll see an instruction that has the number 4 as its argument (“push 4”), as you can see above. Copy the code address of that instruction by selecting it from the address column on the left and typing Ctrl+C. Then in the debugger command window, type “eb “, then Ctrl+V to paste the address, then “+1”, then enter. The debugger will go into memory editing mode, starting with the address of the color value. Now you can choose the color you want. 1 is red, 2 is green, and you can experiment if you want a different color. Simply enter the number and press enter twice to commit it and exit editing mode. Here’s what the screen should look like after you’re done:


64-bit Versions of Windows and 32-bit Windows Vista and Higher

On these versions of Windows, the core bluescreen drawing function is KiDisplayBlueScreen. Type “u kidisplaybluescreen” and then continue entering “u” commands to dump pages of the function until you see the call to InbvSolidColorFill. On 32-bit versions of Windows, continue by following the instructions given in the Windows XP/Server 2003 section to find and edit the color value. On 64-bit versions of these operating systems, the instruction preceding the call to InvbSolidColorFill is the one that passes the color, so copy its address (the number in the left column) and enter this command to edit it: “eb

+4”. The debugger will go into memory editing mode and you can change the value (e.g. 1 for red, 2 for green):


Viewing the Result

You’re now ready to crash the system. If you’re running 64-bit Windows, you might get a crash without doing anything additionally. That’s because Kernel Patch Protection will notice the modification and crash the system as a deterrent to ISVs that might consider modifying the kernel’s code to change its behavior. There might be a delay of up to several minutes before that happens, though. To generate a crash on demand, run the Notmyfault tool (you can download it from the Windows Internals book page) and press the “Do Bug” button (to avoid data loss, make sure you’ve saved any work and closed all other applications):


You’ll now get a bluescreen in the color you picked, in this case the red screen of death:


The Holiday Bluescreen

In the spirit of the holiday season, I took things one step further to generate a holiday-themed bluescreen: not only did I modify the background color, but the text color as well. To do this on 64-bit versions of Windows Vista or higher, note the call to InvbSetTextColor immediately following the one to InvbSolidColorFill and the address of the instruction that passes the text color to the function, “move ecx, 0Fh”:


The 0Fh parameter represents white, but you can change it using the same editing technique. Use the “eb” command, passing the address of the instruction plus 1. Here I set the color to red (which is a value of 1):


And here’s the festive bluescreen I produced:


Happy holidays! And remember, if you have any troubleshooting cases you want to share, please send me screenshots (.PNG preferred) and log files.

Read more at Mark's Blog ...