About This Page
- What is this? This is a working prototype intended to spark conversation regarding new search features to power site search for the IRS.gov website. We will also use these results to test the search relevance against a number of other possible solutions. The current search engine is using Google Custom Search with a limited set of domains that can populate the search results.
- DISCLAIMER: This site is not affiliated with the Internal Revenue Service of the United States of America. It is not intended to be, nor should it be, used to find tax related information. You are strongly encouraged to visit the IRS website for your tax information needs.
- What is Annapurna? Wikipedia page for Annapurna
- Why Annapurna? No good reason. It was a word that popped into my head when I decided this thing needed a name.
Key Features of this Demo
This particular demo of an IRS.gov site search engine takes advantage of a number of features in the Google Custom Search interface. They are described in more detail here along with sample queries your can use to see a feature in action.
- Restricted search index: This is the most elemental feature of the Google Custom Search technology. In a nutshell, this feature enables the website owner to carve out a slice of Google’s massive search index which includes the entirety of Google’s view of the Internet. By specifying a very limited set of domains (or as few as just one), a search index can be offered to consumers that is highly focused on a site or set of highly related sites. As of this writing, this demo is limited to the IRS website (and its sub-domains), eftps.gov, efile.com, treasury.gov and so on.
- To see this in action: Try searching for [treasury] to see results from irs.gov and treasury.gov. Or try searching for [efile] to see results from irs.gov and efile.com.
- Auto-completion: This feature is evident as soon as you start typing into the search box. The intention of this feature is to suggest popular queries that other users have used to find content on these sites. There are two groups of queries that appear here: those that Google automatically suggests and those that we as site owners can define and prioritize.
- To see this in action: Try typing the word [form ] into the search box (be sure to hit the space character after the word “form”) and then pause for a moment.
- Notice the list of suggested form queries that appear. That specific list and their order were set by me based on two key criteria: these are words that are known to be highly relevant to the IRS website AND they are the most popular queries within the group of keywords beginning with the word “form”.
- You can also see the difference between which suggestions show up on this search site vs. the list of suggested form queries that appear on the main Google.com search site.
- Search refinements: A “search refinement” as it used here is another way of slicing the list of allowable results. If the first slice was at the domain level (we created a sub-set of the web by constraining it to just a small number of domains related to the IRS), you can think of search refinements as a sub-set of a sub-set. In this case, we’re categorizing domains and sections of domains as belonging to a specific group or classification. The first time you’ll see one of these is after you’ve performed a search and you’re looking at the list of tabs that appear just below the search box. The other place you’ll run into refinements is under search results that have been associated with one or more specific categories.
- To see this in action: Perform a search such as [tax preparation] and notice the links that appear under each of the search results.
- You’ll see categories such as “Tax Filing”, “Businesses”, “News” and so on.
- Click on the “Tax Filing” link to see the result set constrained to only results in the “Tax Filing” category. Clicking one of these links causes the same effect as clicking on the “Tax Filing” tab under the search box.
- Promoted results: Sometimes there’s one or more documents or web pages that, more than any other, must be promoted to the top of the search results. These results are sometimes referred to “editorially programmed” search results (because that’s exactly what they are). The reason to take advantage of this feature is to ensure that the absolute best result is presented to your users first.
- To see this in action: Perform a search such as [form 1099] and notice the link that appears in the first search result position with the pale yellow background. That was an editorially programmed result specifically for queries of , [form 1099] and so on.
- The query  also has a second promoted result to see what it would look like if more than one promoted result is programmed.
- There is also supposed to be an image there but for some reason, I can’t get that to show up just yet.
- Synonyms: For those occasions when more than one search query should be executed simultaneously to ensure all relevant documents are considered matches regardless of which variation they used, synonyms are an excellent feature. In a health context, consider that some articles may discuss “high blood pressure” while another may discuss “hypertension”. Without synonyms, a search for [high blood pressure] will likely fail to match on a document that is all about hypertension (and vice-versa). When a synonym has been created, a search for one will include the other with a Boolean OR. For example, if the user searched for [high blood pressure], what is going on behind the scenes is a query of [high blood pressure OR hypertension].
- To see this in action: Perform a search such as [request for taxpayer identification number] or [request for taxpayer identification]. Since this is the name of a very popular form known as Form W-9, it is helpful to expand the search query to include [w-9] in the query.
- In this case, the query becomes [request for taxpayer identification number OR w-9]
Available Features of Google CSE not used in this Demo (yet)
There are a few other known features of the Google Custom Search product that have yet to been taken advantage of.
- On-demand Indexing: The CSE interface supports the use of directly submitted URLs and XML Sitemaps to ensure all desired URLs are getting indexed by your custom search instance.
- Remove URLS: This is the converse of the on-demand indexing. With this feature, you can explicitly define URLs that should be removed from your search index.
- Search query based refinements: This feature is an extension of the search refinements feature described above. The importance of this capability is that you can use query strings (in addition to URL patterns) to further constrain which URLs are acceptable matches.
- Promotions expiration date: A nice feature of promoted content is that you can set an expiration date so that you can cause it to stop running automatically.
- Excluded autocompletions: This is a handy feature to make sure that nothing embarrassing or illegal shows up as a suggested query in the auto-complete drop-down list. Words can be excluded with exact match or pattern matching.
- Annotations: I suppose the best way to describe this feature is as a mechanism that let’s 3rd party site owners decide if they would like to be included in these search results. Typically, the person setting up the CSE decides which sites he or she wants to include and others really have no direct ability to force the inclusion of their own content. But if I publish the specific code that represents this search engine, then any site owner could insert that code onto their page and my custom search engine above would include their content as well.
Other IRS.gov site search resources
- IRS.gov Advanced Site Search Features
- [form 1040] sample search query on IRS.gov
- Blekko IRS slashtag
- [form 1040] sample search query with Blekko IRS slashtag
- USASearch – USA.gov’s Search Engine (powered by Bing)
- USASearch – APIs and Web Services
- More to come!
Working List of Domains to Include in IRS Site Search
- Internal Revenue Service
- United States Department of Justice – Tax Division
- Internal Revenue Service – Real and Personal Property Sales – aka IRS Auctions
- United States Treasury Inspector General for Tax Administration – TIGTA
- IRS Oversight Board
- United States Department of The Treasury – IRS Related Scams
- EFTPS Online – Electronic Federal Tax Payment System
- United States Tax Court
- IRS Careers
- IRS Video Portal Home Page
- Earned Income Tax Credit
- IRS Retirement Plans Navigator
- FreeFile – Prepare and File Your Taxes Online For Free
- StayExempt – Tax Basics for Exempt Organizations
- Taxpayer Advocate Service – TAS
- IRS Tax Map
What We’re Reading
A running list of articles we find on the web that will help the team make more informed decisions.
- How to Use Blekko to Rock at Your Job by Marshall Fitzpatrick – November 6, 2010
- Market Overview: Enterprise Search by Leslie Owens of Forrester Research – September 2, 2011 (We have the full report courtesy of Vivisimo – the enterprise search provider that scored highest according to Forrester)
Google Custom Search On The Fly
The search box below is powered by a search index that uses all of the links on this page to determine which sites should be used in the search index. Interesting idea. All I have to do is add a link on the page and Google will include that site in this search engine.
- To see this in action: Try searching for [myron rosmarin] to see results from rosmarin-search-marketing.com. It is the one site guaranteed to be included only because of links existing in the header at the top of this page.
Blekko /irs Slashtag Search
This search box is powered by Blekko’s /irs slashtag whose index is using the list of links described above. The results are viewable on Blekko’s site (but I’m working on changing that to this page).
- To see this in action: Try searching for [estimated tax payments] to see results.