String Examples

Strings enable you to create “rules” or “guidelines” that instruct the algorithm on how to handle certain incoming data values. Some examples of strings and their uses are described in this topic.

ADDR-TOK

This is only an example and does not contain all possible configurations.

Address tokens are used to standardize portions of a member address for comparison and bucketing. Because contents of addresses can vary (for example, a single address can be entered as: 124 W. Park Dr. or 124 West Park Drive), standardizing certain elements enables a common format to be used during comparison. For example, “apartment” can be formatted as “APT, or “Drive” to “DR”. Address tokens are defined in mpi_strword.

There is a difference between tokens and abstracts. Tokens are items extracted from a “unit” of information, such as the word “Drive” in an address. Abstracts are items from which we derive a meaning, such as “DDS” meaning “Dentist.”

ADDR2-MANON

Occasionally when creating a member record, a street address might not be available. A registrar might enter “1234 Main Street, Anytown, NY 12345” as the address. Providing a list of anonymous address values enables InfoSphere® MDM to ignore the value and not assign a score. To do this, use string code whose Code value is ADDR2-MANON. Anonymous address values are defined by adding the anonymous addresses to the string value file associated with the ADDR2-MANON string code. Values are stored in the mpi_stranon table. Anonymous addresses must be entered in standardized format.

ANAME

The Anonymous Name string code with aname.txt string value file is used when creating a member record for which certain name information is not available. For example when creating a member record for a newborn, a registrar might enter “Babyboy” as the first name. Providing a list of anonymous name values enables InfoSphere MDM to ignore the value and not assign a score. Anonymous name values are defined in the mpi_stranon database table.

BXNM-ABS

The BXNM-ABS string code is used to define and manage the use of abstracts for specialty codes in InfoSphere MDM. During a name standardization process, each word or token making up a member or business name is examined to see if a “mapping” exists to an abstract value. If so, the abstract value is added to the comparison data. The mappings for a name token to abstracts are defined in the mpi_strequi table.

Mappings for attributes to abstracts in mpi_strequi are slightly different than other mappings. Since attribute values across multiple sources may not be the same, the mpi_strequi.strcode field is used to define whether the mapping should apply across all sources or only to a specific source. The mpi_strequi records have the following formats:
  • For global mappings across all sources: strcode = SA0-ABS; strval1 = abstract value; strval2 = attribute value
  • For source-specific mappings: strcode = SA<srcrecno>-ABS; strval1 = abstract value; strval2 = attribute value

For example, a specialty code attribute value “PST” might map to an abstract value of “RHB” for Source 13, whereas in Source 14 “5C” might map to “RHB.” To accomplish this mapping, define an mpi_strequi record, strcode=SA13-ABS, to map “PST” to the abstract “RHB” and another record with strcode=SA14-ABS to map “5C” to “RHB.”

Each mapping must be defined individually.

BXNM-TOK

This string value is used to standardize certain components of a business name. BXNM-TOK values are stored in mpi_strword. Remember that tokens are items extracted from a unit of information.

BYR

Some implementations, for example those using the InfoSphere MDM Offender algorithm, compare on the birth year of a member (without day and month). This string enables you to define an incoming year value and the converted value to use in bucketing and comparison. Values are stored in mpi_wgtnval.

DATE

This anonymous values string is used when you want certain dates ignored during comparison. For example if a birth date is not available when creating a member record, a “dummy” value may be entered. This can also be used if you wanted to prevent all birth dates before a specific date from being used in comparison. Values for this string are stored in mpi_stranon.

HGT

This string is used to generate a standardized member height value (range of heights) for bucketing and comparison. Often the height entered for a member is only an approximation. Using the HGT string, you can define height ranges. For example, you might specify that all height values of 4'8” to 5'2” should have a bucket value of 5'0” and 5'3” to 5'7 should have a bucket value of 5'5”. Values are stored in mpi_strnbkt.

NICKNAME

Using the nickname string enables you to map names with alternate names, thus increasing the accuracy of comparisons. Example mappings can include: Abigail to Abbey or Gail, William to Bill or Willie, or Thomas to Tom, Tommy, or Tommie. Values are stored in mpi_strequi.

PHONE

As with Social Security number, dummy values might be entered in the phone field. Specifying such values prevents them from being used in comparisons. Values are stored in mpi_stranon.

PXNM-DGR

This string enables you to define additional standardization for values that might appear in a member name field; for example CPA, DDS, or ESQ. Values are stored in mpi_strword.

PXNM-FNJ

This string enables you to define additional standardization for values that might appear in a member's first name; for example “Lo” or “La.” Values are stored in mpi_strword.

PXNM-LNJ

This string enables you to define additional standardization for values that might appear in a member's last name; for example “Ben” or “Von.” Values are stored in mpi_strword.

PXNM-MNJ

This string enables you to define additional standardization for values that might appear in a member's middle name; for example “Jo” or “La.” Values are stored in mpi_strword.

PXNM-PFX

This string enables you to define additional standardization for values that might appear as a prefix in a member name; for example “Dr” or “Mrs.” Values are stored in mpi_strword.

PXNM-SFX

This string enables you to define additional standardization for values that might appear as a suffix in a member name; for example “Jr” or “II.” Values are stored in mpi_strword.

QXNM-NAME

You have the option to specify RealNames in the mpi_strword table using a strcode of QXNM-NAME. The words specified in this table are not compared for phonetic or edit-distance matches. All real names need to exist in mpi_strword with a strcode of QXNM-NAME. See Enabling RealNames with QXNM.

RXNMARABICRULES

This string provides an example set of ENCODER rules for processing Arabic names with RXNM. See Custom Phonetics and Rule Sets for more information.

SSA

Often when entering a member record, the Social Security number might not be available and dummy values are entered (for example, 999999999). By specifying such values in this string, the algorithm can identify which values to ignore during comparison. Values are stored in mpi_stranon.

WGT

For implementations using member weight comparisons, this string is used to generate a standardized member weight value (range of weights) for bucketing and comparison. Bucketing and comparing on exact weight values can be difficult as the weight entered for a person might be just an approximation. Using the WGT string, you can define weight ranges. For example, you might specify that all weight values of 100 pounds to 109 pounds should have a bucket value of 100 pounds, and 110 to 119 should have a bucket value of 110. Values are stored in mpi_strnbkt.

ZIPCODE

Dummy values might be entered when this information is not available during member record creation. Specifying dummy Zip Code values prevents them from being used in comparisons. Values are stored in mpi_stranon.