Getting and cleaning data - XML Flashcards
XML THE ESSENTIALS
XML ARCHITECTURE
XML FIRST STEPS
- library(XML)
- Assign a variable to shorten code lines
> fileUrl <- “http://www.w3schools.com/xml/simple.xml” - Load document into memory
> doc <- xmlTreeParse(fileUrl,useInternal=TRUE)
XML EXPLORATION DRILLING
XML EXTRACTING DRILLING - 1
- Extract and display 1st section
> rootNode[[1]] - Belgian Waffles
$5.95
Two of our famous Belgian Waffles with plenty of real* - maple syrup*
650
XML EXTRACTING DRILLING - 2
- Extract subsection 1 of section 1
>rootNode[[1]][[1]]
Belgian Waffles
xmlValue
function used to extract value of a named XML node
e.g. xmlSApply(xmlNode[1][1] , xmlValue)
Will extract and display the content of the node corresponding to
subsection 1 of section 1 of the document
USING XPATH AS NODE POINTER -1
- nodename Selects all nodes with the name “nodename”
- / Selects from the root node
-
// Selects nodes in the document from the current node that
match the selection no matter where they are - . Selects the current node
- .. Selects the parent of the current node
- @ Selects attributes
USING XPATH AS NODE POINTER -2
- bookstore Selects all nodes with the name “bookstore”
- /bookstore Selects the root element bookstore
- bookstore/book Selects all book elements that are children of bookstore
- //book Selects all book elements no matter where they are in the document
- bookstore//book Selects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element
- //@lang Selects all attributes that are named lang
http://www.w3schools.com/xpath/xpath_syntax.asp
USING XPATH AS NODE POINTER -3
- /bookstore/book[1] Selects the first book element that is the child of the bookstore element.
- /bookstore/book[last()-1] Selects the last but one book element that is the child of the bookstore element
- /bookstore/book[position()<3] Selects the first two book elements that are children of the bookstore element
- //title[@lang] Selects all the title elements that have an attribute named lang
- //title[@lang=’en’] Selects all the title elements that have an attribute named lang with a value of ‘en’
USING XPATH AS NODE POINTER -4
/bookstore/book[price>35.00] Selects all the book elements of the bookstore element that have a price element with a value greater than 35.00
/bookstore/book[price>35.00]/title Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00
/bookstore/* Selects all the child nodes of the bookstore element
//* Selects all elements in the document
//title[@*] Selects all title elements which have any attribute
USING XPATH AS NODE POINTER -5
//book/title | //book/price Selects all the title AND price elements of all book elements
//title | //price Selects all the title AND price elements in the document
/bookstore/book/title | //price Selects all the title elements of the book element of the bookstore element AND all the price elements in the document
XPATH EXAMPLES
Part of the XML package
xpathSApply(rootNode,”//name”,xmlValue)
[1] “Belgian Waffles” “Strawberry Belgian Waffles” “Berry-Berry Belgian Waffles”
[4] “French Toast” “Homestyle Breakfast”
xpathSApply(rootNode,”//price”,xmlValue)
[1] “$5.95” “$7.95” “$8.95” “$4.50” “$6.95”