I have been listening to lots of podcasts about programming languages and I find it fascinating. I am trying to be more of a craftsman. As I write code in my job or personal time I ask myself if it is concise and legible. I was working on some software tools to read and manipulate websites programmatically. I found myself writing a lot of code to find a field on the page that matches certain criteria then taking action on the field. I took the opportunity to explore what could be done. I did this before researching a lot so I could discover it myself. Later I compared my work with others.
For example, I had a web page where I needed to click on a specific link. The anchor tag had no ID or Name associated to it so I had to search through all the anchor tags with a specific inner HTML. This looks like:
MyDocument = WebBrowser.Document MyElements = MyDocument.getElementsByTagName(TagName) For n = 0 To MyElements.length - 1 MyElement = MyElements(n) If InStr(MyElement.innerHTML, InnerHTML) Then ReturnElements.Add(MyElement) End If Next Dim MyElement As IHTMLElement = ReturnElements(0) If Not MyElement Is Nothing Then MyElement.click()
That’s not ideal code but you get the point. I wrote this while I was iterating through so I didn’t optimize it. Since i was doing it a lot I made a function that did it all and added it an as extension method to the Web Browser class. It’s use looked like:
webbrowser.ClickByInnerHTMLContainsAndWait("a", "By Channel")
That looked concise but I quickly had other needs and ended up with other extensions like GetElementsIfOuterHTMLContains and GetElementsIfInnerTextContains which resulted in an explosion of methods. It was easy to read but had a complex structure and was difficult to maintain. I decided I wanted a more flexible syntax. Taking a step back I decided to write out what I wanted to say and then I could see if I could bend the language to do it my way. Here is what I wanted:
'In all the text fields on the web page that contain the word 'Bentley' replace it with 'Smart' WebBrowser.textfields.replace("Bentley", "Smart")
'This will Click the 'Next" link WebBrowser.links.innertext.contains("Next").first.click
'You can even chain things together 'Click the last link on the web page that begins with 'St' and ends with 'x' WebBrowser.links.innerText.beginsWith("St").innerTest.EndsWith("x").last.click
That would be great. How could I do that? It could be done with linq.
elements.Where(Function(s) s.innerText.StartsWith("St")).Where(Function(s) s.innerText.EndsWith("x")).Last.click
That works but due to the flexibility of linq you loose some of the brevity.On the other hand, writing a full linq query will allow ‘or’ statements which my syntax would not allow for. Let’s see how we could make it closer to what we want. In effect we want to be have a collection that allows you to filter out items based on the available methods that exist in the item type.
In our example we have a collection of HTML elements. I want to call a method on the collection to filter it down to just links (items with an ’a’ tag which is an anchor). That’s fairly easy to accomplish.
On the collection of links I want to filter based on the innerText value on each of the elements. My syntax indicates that we should be able to calling the innerText method on the collection of links. The decision here is what to return when that is called. My initial plan involved returning a key value pair of strings and the elements. This was returned in a custom type that had methods for all the string methods like StartsWith and Contains. If we called the Contains method it would go through the keys and if the key contained that text it would return that element in the collection. Said another way, it would return all the elements that return true for the selected method. When I was done I would return a list of the original type and I could start again and loop as many times as I wanted. It worked.
'Here is a simple object to contain a paired test string and an element 'This keeps them together >Public Class TextAndElementPair Public Text As String Public Element As IHTMLElement Sub New(ByVal _text As String, ByVal _element As IHTMLElement) Text = _text Element = _element End Sub End Class 'Here is the object that is returned from the call to 'InnerText' that allows you to call Contains on all the elements Public Class ElementsStringFilter Dim _elements As List(Of TextAndElementPair) Sub New(ByVal Elements As List(Of TextAndElementPair)) _elements = Elements End Sub Public Function Contains(ByVal Text As String) As List(Of IHTMLElement) Dim Elements As New List(Of IHTMLElement) For Each Pair In _elements If Pair.Text.Contains(Text) Then Elements.Add(Pair.Element) Next Return Elements End Function End Class
I looked at the code and it looked convoluted and clunky. I also realized that I was creating a copy of most of the objects and more objects with every dot in my syntax. There had to be another way. In my next post I’ll go over what I did next.