PowerQuery M: import data from 10 websites and put it into a single table

56 views Asked by At

Could someone please help me fix this VBA code so that the output doesn't have any null rows and all data for a particular date is displayed parallely? Also, it'd be great if you could optimize this macro (it takes sooo long for it to load) so that it doesn't download all data from those website but just accesses these two containers from each website to download tables:

  1. XPath="/html/body/main/div/div[4]/div/div/div/div[3]/div[1]/div/div[2]/div[2]/div[1]/table"
  2. XPath="/html/body/main/div/div[4]/div/div/div/div[3]/div[1]/div/div[2]/div[2]/div[2]/table"

Thank you so much!


let
    // Define a list of slugs
    Slugs = {
        "IBIT",
        "FBTC",
        "BITB",
        "ARKB",
        "BTCO",
        "EZBC",
        "BRRR",
        "HODL",
        "BTCW",
        "GBTC"
    },

    // Define a function to construct URLs for each slug
    ConstructURL = (Slug) => "https://ycharts.com/companies/" & Slug & "/total_assets_under_management",

    // Define a function to apply transformation steps to each URL
    TransformData = (URL) =>
    let
        Source = Web.Page(Web.Contents(URL)),
        Data0 = Source{0}[Data],
        Data1 = Source{1}[Data],
        CombinedData = Table.Combine({Data0, Data1}),
        // Extract the company name from the URL
        CompanyName = Text.BetweenDelimiters(URL, "companies/", "/"),
        // Rename columns dynamically based on the company name
        RenamedColumns = Table.RenameColumns(CombinedData, {{"Date", "Date"}, {"Value", "Value_" & CompanyName}}),
        // Change the data type of the columns
        ChangedType = Table.TransformColumnTypes(RenamedColumns, {{"Date", type date}, {"Value_" & CompanyName, type text}})
    in
        ChangedType,

    // Construct URLs for each slug
    URLs = List.Transform(Slugs, each ConstructURL(_)),

    // Apply transformation to each URL and combine the results
    CombinedTables = List.Transform(URLs, each TransformData(_)),

    // Combine new data with existing data
    CombinedTable = if List.Count(CombinedTables) > 0 then Table.Combine(CombinedTables) else null
in
    CombinedTable


Update: I've managed to put everything in two columns but I'd rather have one date column and ten columns with values:

let
    // Define a list of slugs
    Slugs = {
        "IBIT",
        "FBTC",
        "BITB",
        "ARKB",
        "BTCO",
        "EZBC",
        "BRRR",
        "HODL",
        "BTCW",
        "GBTC"
    },

    // Define a function to construct URLs for each slug
    ConstructURL = (Slug) => "https://ycharts.com/companies/" & Slug & "/total_assets_under_management",

    // Define a function to apply transformation steps to each URL
    TransformData = (URL) =>
    let
        Source = Web.Page(Web.Contents(URL)),
        Data0 = Source{0}[Data],
        Data1 = Source{1}[Data],
        CombinedData = Table.Combine({Data0, Data1}),
        // Extract the company name from the URL
        CompanyName = Text.BetweenDelimiters(URL, "companies/", "/"),
        // Change the data type of the columns
        ChangedType = Table.TransformColumnTypes(CombinedData, {{"Date", type date}, {"Value", type text}}),
        // Add a custom column for company name
        AddedCompanyColumn = Table.AddColumn(ChangedType, "Company", each CompanyName)
    in
        AddedCompanyColumn,

    // Construct URLs for each slug
    URLs = List.Transform(Slugs, ConstructURL),

    // Apply transformation to each URL and combine the results
    CombinedTables = List.Transform(URLs, each TransformData(_)),

    // Combine new data with existing data
    CombinedTable = if List.Count(CombinedTables) > 0 then Table.Combine(CombinedTables) else null
in
    CombinedTable

Could someone please help me fix this VBA code so that the output doesn't have any null rows and all data for a particular date is displayed parallely? Also, it'd be great if you could optimize this macro (it takes sooo long for it to load) so that it doesn't download all data from those website but just accesses these two containers from each website to download tables:

  1. XPath="/html/body/main/div/div[4]/div/div/div/div[3]/div[1]/div/div[2]/div[2]/div[1]/table"
  2. XPath="/html/body/main/div/div[4]/div/div/div/div[3]/div[1]/div/div[2]/div[2]/div[2]/table"

Thank you so much!

1

There are 1 answers

1
taller On BEST ANSWER

Pls try.

let
    Comps = {"IBIT","FBTC","BITB"}, 
    GetTable = (Variable as text) =>
        let
            Source = Web.BrowserContents("https://ycharts.com/companies/" & Variable & "/total_assets_under_management"),
            Table1 = Html.Table(Source, {{"Column1", "DIV.col-6:nth-child(2) > TABLE.table:nth-child(1) > TBODY > TR > :nth-child(1)"}, {"Column2", "DIV.col-6:nth-child(2) > TABLE.table:nth-child(1) > TBODY > TR > :nth-child(2)"}}, [RowSelector="DIV.col-6:nth-child(2) > TABLE.table:nth-child(1) > TBODY > TR"]),
            Table2 = Html.Table(Source, {{"Column1", "DIV.col-6:nth-child(1) > TABLE.table:nth-child(1) > TBODY > TR > :nth-child(1)"}, {"Column2", "DIV.col-6:nth-child(1) > TABLE.table:nth-child(1) > TBODY > TR > :nth-child(2)"}}, [RowSelector="DIV.col-6:nth-child(1) > TABLE.table:nth-child(1) > TBODY > TR"]),
            CombinedTables = Table.Combine({Table1, Table2}),
            output = Table.RenameColumns(CombinedTables, {{"Column1", "Date"}, {"Column2", Variable}})
        in
            output,
    Tables = List.Transform(Comps, each GetTable(_)),
    MergedTables = List.Accumulate(List.Skip(Tables), Tables{0}, (state, current) => Table.Join(state, "Date", current, "Date"))
in
    MergedTables

  • Based on my testing in Microsoft 365, VBA is more efficient compared to Power Query. However, there might be a better way to optimize the Power Query code.
Option Explicit

Sub GetDataFromWebPages()
    Dim Comps As Variant
    Comps = Array("IBIT", "FBTC", "BITB", "ARKB", "BTCO", "EZBC", "BRRR", "HODL", "BTCW", "GBTC")
    Dim sData As String, oHtml As Object
    Dim Table1 As Object, Table2 As Object, oDic As Object
    Dim i As Long, j As Long, vTab, sKey, sVal, arrVal
    Set oHtml = CreateObject("HTMLFile")
    Set oDic = CreateObject("Scripting.Dictionary")
    For j = 0 To UBound(Comps)
        sData = GetWebData("https://ycharts.com/companies/" & Comps(j) & "/total_assets_under_management")
        oHtml.body.innerHTML = sData
        Set Table1 = oHtml.getElementsByClassName("col-6")(0).getElementsByTagName("table")(0)
        Set Table2 = oHtml.getElementsByClassName("col-6")(1).getElementsByTagName("table")(0)
        sKey = "Date"
        sVal = "Value_" & Comps(j)
        If oDic.Exists(sKey) Then
            arrVal = oDic(sKey)
            arrVal(j) = sVal
            oDic(sKey) = arrVal
        Else
            ReDim arrVal(UBound(Comps))
            arrVal(j) = sVal
            oDic(sKey) = arrVal
        End If
        For Each vTab In Array(Table1, Table2)
            For i = 1 To vTab.Rows.Length - 1
                sKey = vTab.Rows(i).Cells(0).innerText
                sVal = vTab.Rows(i).Cells(1).innerText
                If oDic.Exists(sKey) Then
                    arrVal = oDic(sKey)
                    arrVal(j) = sVal
                    oDic(sKey) = arrVal
                Else
                    ReDim arrVal(UBound(Comps))
                    arrVal(j) = sVal
                    oDic(sKey) = arrVal
                End If
            Next i
        Next
    Next
    Dim ws As Worksheet, arrRes, iR As Long
    ReDim arrRes(1 To oDic.Count, 1 To UBound(Comps) + 2)
    For Each sKey In oDic.Keys
        arrVal = oDic(sKey)
        iR = iR + 1
        arrRes(iR, 1) = sKey
        For j = 0 To UBound(arrVal)
            arrRes(iR, j + 2) = arrVal(j)
        Next
    Next
    Set ws = ThisWorkbook.Sheets.Add
    With ws.Range("A1")
        .Resize(oDic.Count, UBound(Comps) + 2).Value = arrRes
        .CurrentRegion.Sort key1:=.EntireColumn, Header:=xlYes
        .CurrentRegion.EntireColumn.AutoFit
        .EntireColumn.NumberFormat = "MM/dd/yyyy"
    End With
End Sub
Function GetWebData(url As String) As String
    Dim xmlHttpRequest As Object
    Set xmlHttpRequest = CreateObject("MSXML2.XMLHTTP")
    xmlHttpRequest.Open "GET", url, False
    xmlHttpRequest.send
    GetWebData = xmlHttpRequest.responseText
End Function