AIIM CIP Deck Flashcards
ANSI/NISO 239.19-2005
Definition of Taxonomy
ISO 15489-1:2001
metadata definition and categorization (records management generally
ISO/IEC 15948:2004
PNG
ISO/IEC 10918-1
JPEG
ISO 19005
PDF/A
ISO 32000
PDF 2nd iteration
ISO 23081
metadata accrues overtime: metadata generally
Capture
The process of (1) getting information from its source into some type of more formal information management environment (input) (2) then recording its existence in the system (tagging)
Sources of information
- PCs
- Laptops
- Tablets
- phones
- other new technology
- Fileshares
- local storage drives
- disks
- USB drives
- hosted applications
- paper
- business apps (CRM HR)
Structured information
- Consists of fundamentally spreadsheet data in a table over many linked tables
- has a fixed structure
- usually database
- most information repositories are combinations of structured data and some place to store binary files associated with them
Unstructured Information
Variable in: * format *content Word, Excel, Project, PDF, Scanned TIFF, email Might have rules associated with content
Capturing structured
input annually by entry or system
extracted from on syntax to others.
Capturing unstructured
a formal procedure so…
(1) info can be controlled
(2) filed in structured content with related items
to give context, protect, retain, and search.
File Format Considerations
Can be highly proprietary only accessible with special software or tools. Consider: -audience -regulatory requirements -value of information overtime
JPEG
Joint Photographic Experts Group
Very good at compression of continuous tone images.
Lossy compression - data loss in compression so repeated conversions may end up in data loss
ISO/IEC 10918-1
PDF/A - archive ISO 19005, ISO 32000
PDF/Engineering
PDF/x prepress digital exchange
PDF/UA - Universal Accessibility
PNG
Portable Network Graphics Lossless Compression 32 bit color a W3c standard '96 ISCO/IEC 15948:2004
TIFF
Tagged Image File Format
-most scanned
lossless compression, good for bi-tonal(black and white)
also support multiple pages but not all TIFF viewers support all options, and not all browsers have TIFF viewers.
ECM
Enterprise Content Manager
Core capability
-Document Management - check out, tracking, version
-Record Management - Formal content based rules and specific retention
-Workflow-take specifications based on metadata
-search
-web content management
-capture/scanning - applying metadata
-collaboration
-publishing - content available on multiple platforms
-archiving - for collaborative communication
Digital Asset Management
ECM for media
rich media, audio, video, digital photographs, design documents, logos
May be a dedicated system or added on to ECM
- Tracks copyright license restrictions
-specialized metadata
EFSS
Enterprise File Synching and Sharing Solutions
- mostly cloud based
- allow users to share and sync documents over multiple devices
BUT come from consumer base and lack many enterprise functions like central control, security, metadata, lifecycle management
Capture - taking control
input side of information management
Everything is tagged: for every object ask “wher does this go?”
Microsoft Office Integration
ECM Typically allow for integration that intercept the file save menu.
ECM - importing existing content
- Most ECM have simple tools
- system admins also may have batch import utilities and can automate BUT some prep is needed including cleanup of dups and junk but follow the IG rules for the company
Automating Information Capture
Ways to Auto
- by role or user
- by content type
- by work process
- through workflows
- through metadata values
- through bulk import
- through analytics
AutoCapture by role or User
- id users most likely to create records-then capture everything
Usually
*senior management
*assistants to senior management
*specific roles, legal staff or personnel/ HR staff
*anyone making business decisions
AutoCapture by Content Type
identify specific types of documents e.g. contracts, invoices, personnel records. Many ECMs have definition of content type or record class and may associate metadata fields or values, business rules, blank templates.
AutoCapture by Work Process
those inherently decision or transaction oriented - by default when the final version is approved or signed normally
- contracts
- invoices
- wage statements
- financial statements
AutoCapture through workflows
rules can be defined that at a certain step in workflow a record is created, identified then processed
Like contract review when approved it is a record then when executed the executed copy associated with the record.
AutoCapture through metadata Values
used with bulk import could use tool to crawl metadata -file formats -dates -location
Autocapture through Bulk import
taking legacy information and building into new system
- maybe too expensive to do one at a time
- at need to be formatted to flat file format to import to new system
- others have utilities to do this
- to be valuable metadata should be included in import
AutoCapture through Analytics
relies on text, metadata, rules, etc. SME's help define -more scalable -doesn't rely on humans -more consistent -even if wrong -transparent
forms processing
most of form unwanted 1. scan form 2. is it readable 3. which standard form is it form recognition software easily places information into a database
formal v. informal content
Formal-if it documents or support business decisions *records required by law or regulations *signed or executed contracts *invoices *formal planning or strategy Informal-everything else *copies available for discussion and familiarization *Early or outdated drafts of contracts draft meeting minutes
Capturing from User-driven IT
policies on formal and informal content: formal on ECM etc. or system with version control, security.
Informal can be on more consumer based platforms.
Capturing email
most ECM integrate allowing filing with minimu clicks and can be automatic also by emailing links to ECM you can keep volume down Challenges: *attachments-various formats *multiple topics *CC/BC make complicated *quota or size limits *used as data repositories
Approaches to Managing Email
by role -managers -staff assistants -high level policy decision makers by "big buckets" -inbox - new short period 30-180 days -work in progress, working on processes 6 months to 1-3 years -records zone - retention schedule
OCR
Optical character recognition
good high quality 90% usually TIFF or PDF
Also can have zones for form processing
ICR and HCR
Intelligent Character Recognition and Handwriting Character Recognition
to read and extract handwriting or other characters (scantron)
Lower accuracy although improving
Scanning considerations and benefits
-Amount of documents
-type and quality needed OCR?
Benefits
_accessibility
-searchable
-security +user access controls
-space + cost
scanning process approaches
what to scan
- Scan everything in the back file
- expensive
- but full record
- Do a partial conversion
- cost effective
- hard to search (two repositories to search)
- Day forward
- after this day everything scanned
- Scan on demand
OMR
Optical Mark Recognition
check boxes yes/no scantron
Problems with Email as Collaborative tool
- information chaos - tons of it in knowledge silos
- locks down information and knowledge
- distracts knowledge workers
- lacks information filters
- Email makes it difficult to share large
Collaboration with electronic documents
Problem: -Volume is huge -no ECM or information management like ungoverned sharepoint to many versions a digital landfill
findability
not just typing in a search box but anticipating needs and finding things you didn’t know you needed
metadata
relevant data about data -NISO
ISO 15489-1:2001: Data describing context, content, and structure of records and their management overtime
ISO 23081: accrues overtime
ERP
Enterprise resource planning
Software to collect and organize data to provide management insight with KPIs
KPI
Key Performance Indicators
What is search?
The ART and SCIENCE of making content easy to find.
Art - language arts: using software to parse, diagram, or infer meaning from content
Science-Library Science - techniques like metatagging, categorization, and taxonomies.
The metadata strategy
- Create metadata model - id and understand types+purpose
- synchronize metadata definitions and types
understand processes, purposes, and people who create and use metadata, structures, and vocabularies - create with an eye to refresh and improvements
- find opportunities for automation and time savings.
metadata standards
for specific industries or situations: MoREQ DoD 5015.2 EDRM.NET XML Schema.org digital images- EXIF, IPTC, and XMP
Intrinsic Metadata
default when created -file date -file size -title -author reusing files may make this information wrong.
Manual data entry
entered by user or by others later
Inherited metadata
*by location
*or classification
item inherits metadata from group
metadata extraction using recognition tech
OCR with zones, and barcoding to extract data
metadata from existing data sources
extract from database
dedup and normalized
LDAP
Lightweight Director Access Protocol -identify peoples -countries -organizations -units used in email
metadata via user credentials
using LDAP or other identity sources
metadata capture via workflow
process oriented metadata
routing data
best in transactional situations
Content analytics help auto
- creating classification buckets
- sorting or identifying information
- searching for relevant information
- search across languages
- creating word lists or taxonomies
steps for successful content analytics
-select representative document
a good mix, more volume, more mix
-us SME to review
-iterate with more documents to test
recall/precision
100% precision - no false positives
100% recall - no false negatives
categorization
ISO 15489
The systematic identification and arrangement of business activities and/or records into categories according to logically structured conventions, methods, procedural rules represented in a classification system
Benefits of Classification (8)
- Linkages between individual records easily shows a continuous record of activity
- consistent naming
- assists in retrieval of all records relating to function, topic, activity
- assists in determining security and access for records
- assist in allocating user permissions
- assists in distributing responsibilities for management of records
- assists in distributing records
- determine and apply record retention easier
taxonomy
ANSI/NISO Z39.19-2005
A collection of controlled vocabulary terms organized into a hierarchical structure. Each term is in one or more parent/child (broader/narrower) relationship to other terms
Lambe Criteria to evaluate a taxonomy (dumbchirp)
Durable - no need for lots of changes
Unambiguous - only one place for records
Meaningful - categories and terms agree general usage
Balanced - Categories, sub, content dispersed evenly
Consistent
Hospitable - covers all content now and future
Intuitive- user friendly
relevant - representative of how business works
parsimonious - no unused categories
Developing categorization scheme
- Identify stakeholders
- define purpose
- determine approach
- collect information
- draft the scheme
- pilot
- deploy
- gather feedback
keyword search
usually 1 to 1 or 1 to many correlation
uses character strings or search engine
usually query of content, index, and metadata.
human powered directory
humans compiled like editors etc.
speeds process but subjective.
query engine
computer program that actually searches documents for words
- structured like Boolean or
- unstructured like plain language
search index
tells user where what they seek is found
enterprise search
how people in an organization can find what they need in any repository, in any form. Google is not a magic bullet. it can’t id copies or dups or original
universal search
one tool creates one index across many repositories
disregards all other search tools or indices
application search
search tool built into an application nor system
- limited to only the info
- manage by the application
- user management controls built in
- not as feature rich
homogenous search engine
one same search technology on many information sets.
search tools on three document management applications pulls from the three indices results combined.
Federated search
multiple repositories and multiple search engines
master search performs search on other search engines then organizes the results
Merlot is an example
google search
relevance ranking by links
Types of taxonomies
- lists
- trees: parent child categories
- hierarchies: strict rules on how subdivided - mutually exclusive
- poly-hierarchies -break in the rules, not mutually exclusive, cross linkages
- facets: multiple hierarchies
- matrix: two or three facets in matrix
- system maps: related terms
- Thesaurus:
- Ontologies: concept-relationship-concept
Information Governance
The specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival and deletion of information. Includes: processes, roles, standards and metrics to achieve goals.
ISO 13008
information and documentation: digital conversion and migration process
ISO 16175
Information and documentation: principles and functional requirements for records in electronic office environments
NFPA 232
Standards for protection of records by national Fire protection association.
ANSI/ARMA 19-2012
policy design for electronic messages