Effective use of Search Engines for Genealogy
Presentation by Anne Lehmkuhl
West Gauteng branch of the Genealogical Society on 22 January 2005
The Internet is a huge collection of information and it is growing by the day. For a genealogist, it can be a formidable task to locate valuable information. There are hundreds of search engines out there. Most of them do not provide much information of use to genealogists but some have a wealth of information if you know how to find it. One of the biggest mistakes people make when beginning to research their family tree online is believing all of the information they need will appear at a click of their mouse.
1 SEARCH ENGINES - HOW THEY WORK & WHAT THEY DO
The Internet is the largest information repository in the world. The amount of information available often makes it difficult to find the specific information you want. More than a million pages of information are added to the Internet each day. It’s no wonder that searching the Internet is like looking for a needle, but not in a haystack: in a meadow! By learning how to search effectively, you'll spend a lot less time chasing dead ends. The Internet provides various ways to search for content.
What is a Search Engine? A search engine is a searchable database of Websites collected by a computer program (called a crawler, robot or spider). Search engines are the card catalogues of the Internet. The crawler reads millions of Web pages and indexes the text they contain into a very large database. Because Web documents are one of the least static forms of publishing (they change a lot), crawlers also update previously catalogued sites.
Different search engines have different strengths in searching different types of information. The most powerful search engines for research purposes are, in my opinion, the advanced versions of Google, Northern Light, FASTSearch and Alta Vista. Two little-known search engines favoured by researchers are Teoma and Vivisimo (really Meta search engines).
One of Google’s most useful features for genealogists is the cached feature. When you’ve done your search, if the Web page the link gives you is no longer available, click on the Cached link and you will see the page as last cached by Google.
Northern Light: http://www.northernlight.com/
This is the professional researcher’s favourite, because it organises material so well by topic. It is based in Canada.
FASTSearch scans the Web every 7 to 11 days to ensure that it has fresh content and that there are no broken links. It supports searching in 49 different languages.
Alta Vista: http://www.altavista.com/
Also has a translation feature.
The Ananzi search engine is devoted to South African Web sites and was created early in 1996, making it the first South African search engine.
Search directories divide Websites into topics and subtopics, eg Arts, Science, Health,
Business, and News. You will find sites that appear only in the directory at the time but with a search engine you can locate sites all over the Web. Sites such as Yahoo are created by cataloguing information submitted by individuals or companies. Its strength lies in its structure of topics and subtopics.
Yahoo is the most popular search directory, although most people think it is a search engine. It is the largest guide to the Web and is compiled by 80+ editors who categorise Websites they come across.
META SEARCH UTILITIES
Another powerful technique for searching is Meta search utilities that send your query to more than one search engine at the same time. Copernic, a Windows program, at
http://www.copernic.com is a popular one. It takes complex search terms and contacts multiple search engines at the same time. The results are collated and further analysed by the program itself to perform search term logic not supported by individual search engines. It has advanced management features like filtering, grouping and summarising. It can also give you email alerts when Websites change or when new pages relevant to your searches are found.
THE INVISIBLE WEB
The Invisible Web is the term used for Web pages that aren’t found by using conventional search engines or directories. It is made up of subject specific search engines or directories. The Librarian’s Index at http://lii.org/ is a very useful one, as is The Invisible Web Directory at http://www.invisible-web.net/
Another little known aid for genealogists is the Internet Archive at
http://www.archive.org/ This site allows you to search for lost Web pages. It contains 10-billion Web pages archived from 1996 to the present. To use it, you type the URL (the Website address) into the search box.
2 NEEDLES IN HAYSTACKS - CREATING EFFECTIVE SEARCH TERMS
GENERAL SEARCH TIPS
Most people have a basic understanding of how to use search engines - they type in what they’re looking for and click "Go" or "Search". This works, but depending on your query, you usually get thousands of sites returned. Wading through many sites before finding what you want wastes time. When you get a large number of search results, concentrate on the top 10 or 20. This way you increase the chances of finding the needle in the haystack.
Here are some general tips for effective searching:
If you misspell the words you’re looking for, you might still find information but it will likely take longer or be unrelated to your query. Remember that spelling may vary, depending on where the Website was created - in the USA or in the UK - both English but with different spelling (eg colour vs color, and favourite vs favorite). Another tip is choice of language - sometimes you won’t find what you're looking for in English, so try an Afrikaans search.
Most search engines treat lower case search phrases as universal, but will perform a case sensitive search if you capitalise any letter. It is better to use lower case letters in your searches. Example: paint will match paint, Paint, paINt, and so on; Paint will match only Paint.
Search engines facilitate finding information by the use of keywords. If you’re using a directory for your search, you don’t have to use keywords as you follow only the subject links provided by the directory. Type as many keywords as you can think of for your query. If you don’t know what a vintage jam pot looks like, you could use vintage jam pot as your search query. When queried like this, search engines will return pages containing any of your keywords, and those that contain them all are usually listed first. This is the type of search most people do, but it is not the most effective way.
Advanced searches are controlled by Boolean operators and certain keywords. Most search engines support Boolean searching. Boolean operators are words and symbols that, when used in conjunction with the keywords you're searching for, help to pinpoint your information. The main words are AND, OR and AND NOT. Most search engines allow substitute symbols such as + representing AND, a space representing OR, and - (minus) representing AND NOT. Here are a few examples, using Google:
+vintage +jam +pot
Returns Websites containing all 3 words (approximately 48 000 sites). The + symbol forces a key word to be included in the search results. Note there is no space between the + symbol and the keyword. Google does an AND search by default.
OR (a space)
vintage jam pot
Returns Websites containing either word, with those containing all three ranked highest (approximately 48 200 sites). AltaVista does an OR search by default.
AND NOT (-)
-vintage +jam +pot
Returns Websites that contain the words jam and pot but do not contain the word vintage (approximately 752 000 sites). The - symbol forces a key word to be excluded from results. This is an extremely powerful research tool when you learn how to use it properly with the + symbol.
What is the difference between using them and not using them? If you search for "vintage jam pot" using quotation marks, search engines treat your query as an exact phrase and you get about 47 Websites listed. If you don’t use quotation marks, search engines return about 48 200 Websites!
You can string Boolean operators together for more complex, focused searches.
Typing +jam +pot +England +bone -vintage returns Websites mentioning the first four words but not the last word (approximately 19 300 sites). To narrow the search further, you can type +"jam pot" +bone -vintage, which returns Websites that contain the phrase jam pot, the word bone, and definitely will not contain the word vintage (approximately 1 230 sites).
An asterisk * is a wildcard when doing a search. It is placed on the right-hand side of a word or embedded within a word with at least three characters to the left. Use an asterisk to find various spellings or related words. Example: paint* would return matches of paint, paints, painter and painting.
You use brackets that incorporate words and characters. Example: cape AND (cod OR town) lists Websites about Cape Cod or Cape Town.
Field searches look for very specific information of Web pages. You use a field name followed by a colon and then a search term. Valid field names include link (that will find the search term in links only), title (searches only in titles of pages), url (looks only in URLs), alt (looks in labels of images). If I want to know which Websites link to my Website I’d use link:www.rupert.net/~lkool/ to see the results. Field names can be useful for genealogy.
3 DIGGING DEEPER - FINDING GENEALOGICAL GEMS ONLINE
Searching for information online is easy when you develop a detective’s mind. Learning how to translate your problems into appropriate keywords and symbols to use in search engines is like a switch being turned on in your mind and then the keys to the Internet are yours.
The staple of genealogical research is records - births, marriages and deaths. Using your name or surname with the keywords born, died or married brings up mostly Websites containing genealogical data. Adding the place name makes it more effective. Using
Google I could try +"van der merwe" +married +born +died +"south africa" to find 602 Websites containing genealogical information. Changing the search query to +"van der merwe" +trou +gebore +sterf +"suid-afrika" gives me 33 results and they are Afrikaans sites (Van der Merwe being predominantly Afrikaans).
Fine-tuning is necessary for the detective’s mind to find genealogical gems buried deep within the Internet. When you first try your search query, you may or may not see the results you want. Sometimes it takes 3 to 5 search queries to find what you want - this is part of the fine-tuning process. Enter keywords, examine the results. Add more keyword, examine the results, and so on. Sometimes you may have to remove a keyword, but usually you will be adding them. If you get 20 000 results, concentrate on the top 10 or 20 results. With fine-tuning, you get what you want to come to the top of the search results, making it easier to find the needle in the haystack.
Here is an example of fine-tuning:
I’m looking for the name of Hansie Cronje’s (the late South African cricket captain) mother. I don’t know what her name is.
I start my search with "hansie cronje" +mother
I receive about 629 results using Google.
The first 3 sites show her name as San-Marie, so now I can fine-tune my query to "hansie cronje" +"san-marie" This returns about 40 results but none of them give me her maiden name.
Further fine-tuning is required, so I try ewie +sanmarie
Note that I'm using his father’s name (Ewie) that I found in previous search result and
I’ve also combined San-Marie into one word (possible Afrikaans spelling, this being an Afrikaans family).
This gives me 9 results and one of them shows that Sanmarie’s names are Susanna Maria.
More fine-tuning using "wessel johannes" +cronje +"susanna maria"
Note I’ve now used Hansie’s actual names and this returns 16 results with the very first site listed and containing my genealogical gem - Hansie’s family tree back to the stamvader and his mother’s details too! http://www.geocities.com/hugenoteblad/cronje/cronje9.htm
Another fine-tuning trick: When you have a name or surname that is the same as that of a famous person, you will get hundreds of results to wade through. To cut out the results referring to the famous person, you would have to use appropriate keywords and symbols. Here’s an example: If your surname is Reagan you will see that when you start doing searches many of the results are about President Ronald Reagan. To cut those out, use +reagan +born -president. Now the word president will not be in any of the results because it was forced to be excluded.
Another example is a common surname such as Morse or Cook, which are also the name of non-related things. To fine-tune this you could use: +morse +married -code (the word code is excluded because you don't want results about the Morse code).
+cook +died -food -chef (the words food and chef are excluded).
Other general keywords that are useful for genealogical searches include genealogy, stamvader, ancestors, descendants, family, history, certificate, buried, cemetery. Adding these to your specific key-words (name, surname, place name, year) results in more effective searches. Choosing significant keywords for genealogical searches is important.
Digging deeper means you’ll not only look at Websites. There are email groups (many at Yahoo), newsgroups, and databases that may not show up in searches. These need to be accessed directly once you know their URL. Examples include online newspaper archives, image search engines (Google has a good one), library catalogues, online telephone directories, and street directories. You can search newsgroups (back to 1995) via Google, by subject, author or specific newsgroup at http://www.google.co.za/grphp
Genealogy newsgroups are classified under the soc. main heading and then under the soc.genealogy sub-heading. Be aware that newsgroups contain a lot of Spam where they are not strictly moderated.
Despite the importance of the Internet in establishing contacts and information, this valuable research tool is still in the infancy stages when it comes to South African genealogy.
© 22 January 2005