I Made a Walmart Price Scrapper

I built a price checker for Walmart that will both identify pricing of whatever item I choose both in store and compare it to online. Why? Because I didn’t want to pay a service to do it.

It’s hosted on my personal site, but for various reasons I won’t share it. I can, however, share a screenshot:

It’s quite simple and took about 5 hours total. I add a link to an item I want at the bottom, and I can give it a name then click add. Once a day, I have software that scans Walmart’s website and finds the pricing for both in-store and online.

What technologies are used?

  • Visual Studio 2021 C# with EO.WebBrowser
  • MySQL
  • PHP
  • Virtual Windows Server 2017

I could’ve used JavaScript to scrape the web and remove the whole C# software hosted on a Virtual Private Server idea, but I didn’t feel like it.

How does it work?

In layman’s terms:

  1. Have the windows application run once a day in the AM
  2. Pull listed items from the database (in this case Nintendo Switch games)
  3. Open website in windows application like any webbrowser and search for price
  4. Update price in database
  5. Display prices on webpage

Each product in Walmart’s local store has a “/grocery” link in it like the following: https://www.walmart.com/grocery/ip/Mario-Kart-8-Deluxe-Nintendo-Nintendo-Switch-045496590475/55432571

If I remove the /grocery it displays the same product from the online store. Notice in the screenshot above how some prices are different. Supply and demand, I guess. The primary thing I want to focus on is finding anything on clearance at a local store. At some point these games will drop below $20 on clearance to get them out of stock for new items/games, but the price online may always be closer to $40. Conversely, some items go on sale online and not at the store. This could work with any product, really, but I’m primarily using it for video games because I can. I’m sure I’ll find other things to add and I’ll continue to run it because it’s cool.

How’s it done:

First, I open the website in EO.WebBrowser webview control and load it as an HTML document. info[1] is just the URL (shared above) pulled from a created array list from the database.

webView1.LoadRequestAndWait(new Request(info[1]));
string html = webView1.GetHtml();

Then I find all the HTML tags and scan through them all:

string[] getinputs = html.Split('<', '>');
foreach (string str in getinputs)
{
     // do things here
}

The problem I noticed is that the local stores have different classes, so I can’t repeat the code for both local and online. In local stores there is a <span> tag which includes “salePrice”.

A few tests prove that this captures the right line, but the price is not on the same line when I read the HTML file. When it finds this line I set a boolean in the loop, so the next loop it pulls the price. I’m sure there’s an easier way to do this, but whatever.

The online store has a different system for marking the price.

For this one I’ll search for “price-characteristic“. The price for the online version is in the same line. I ended up splitting this line into an array based on the “=” sign so it would be easier to pull.

string[] data = str.Split('=');

This works fine, because the 4th array has the pricing I need from the “content“. Note that “0” is considered the first array.

It needs some formatting to actually get the price I need, so when I call the function I need I use:

data[3].Substring(1, 5)

data[3] is the array I’m calling, the 1 starts at the number “4” for price, and “5” is how long it is. Of course, this will fail spectacularly should the price go below $10. Well, not fail, it will just look like “9.94\” for example. Finally, I update the table. The next part is calling a table update using MySQL:

private void UpdatePrice(string id, string price, Boolean grocery = true)
{
comtext = "UPDATE walmartcheck SET Price = '" + price + "' WHERE ID = '" + id + "';";

using (var connection = new MySqlConnection(connString))
{
    //Open connection
    connection.Open();

    MySqlCommand comm = connection.CreateCommand();
    comm.CommandText = comtext;
    comm.ExecuteNonQuery();

    connection.Close();
}
}

The function UpdatePrice has the “ID”, which is the primary key in the database of the link in question. Price is the price received from the scrapping (see code below). The boolean “grocery” is just determining the column to use in the MySQL query. I use “OldPrice” because that was the original name of the column and never bothered changing it to “Online” – which it actually represents.

if(next == 1)
{
    UpdatePrice(info[0], str.Substring(1));
    break;
}
if (str.Contains("salePrice")) // in-store
{
    next = 1;
}
if(str.Contains("price-characteristic")) // online
{
    string[] data = str.Split('=');
    if(data.Length >= 4)
    {
        UpdatePrice(info[0], data[3].Substring(1, 5), false);
    } else
    {
        UpdatePrice(info[0], "??.??", false);
    }
                        
    break;
}

To display it on the web, I use a simple query string:

$sql = "SELECT * FROM `walmartcheck` WHERE Active = 1 ORDER BY ID ASC";

Then I build the table:

<table style="width:100%" border="1">
  <tr>
    <th>Item</th>
    <th>Price</th>
    <th>Online</th>
    <th>Manage</th>
  </tr>

And a loop for the table:

// loop
 $result = $conn->query($sql);

while ($row = mysqli_fetch_row($result)):  
       echo '<tr style="border:1px solid black;">
          <td>'.$row[2].'</td>
          <td><a href="'.$row[1].'" target="_blank">'.$row[3].'</a></td>
          <td><a href="'.str_replace('/grocery','',$row[1]).'" target="_blank">'.$row[4].'</a></td>
          <td><a href="secretwebsite.php?m='.$row[0].'">Hide</td>
        </tr>';
endwhile; 

This is honestly not a best practice. I’m placing HTML tags within a PHP echo call (which displays text), and calling the database columns. Notice the “.$row[0].” What this is doing is calling the primary key of the row. When I (or a user) clicks it, it will hide that so the program above doesn’t look for it anymore. $row[1] is the link, $row[2] is the name of the item, $row[3] is the in-store price, and $row[4] is the online price. Next, I close the table:

</table>

In order to update that link to a hidden state, it’s relatively simple:

if(isset($_GET['m']))
{
    $sql = "UPDATE `walmartcheck` SET Active = 0 WHERE ID = ".$_GET['m'];

    $conn->query($sql);
}

First I check for “m=” in the URL, and if it’s there then it updates the related to an inactive state. I’m not a fan of deleting rows and it’s a best practice not to, so just about everything I build has an Active column and a state of 0 or 1 (hidden or available). You may wonder if I have any checks for bogus numbers, and I don’t because it doesn’t do anything.

And then the form:

<form action="secretwebsite.php" method="post">
    Link: <input type="link" name="link"><br>
    Name: <input type="name" name="name"><br>
    <input type="submit">
</form>

To which is inserted by PHP:

if(isset($_POST['link']) && isset($_POST['name']))
{
    $sql = "INSERT INTO `walmartcheck`( `Link`, `Name`) VALUES ('".$_POST['link']."','".$_POST['name']."');";

    $conn->query($sql);
}

This took about 5 hours total to develop. Very sloppy and probably not secure, but it was fun regardless.

2 replies on “I Made a Walmart Price Scrapper”

Comments are closed.