Organisations are relying on data more than ever and this upward trend is set to continue for a while. This provides an opportunity for the public data collection industry to mature and progress, putting more focus on data quality, ethics, and the employment of more advanced technical solutions. These were the key messages industry experts shared at the prominent web scraping conference OxyCon 2021.
One of the industries that strongly added to the growth of web scraping popularity in recent years is e-commerce. While external data can benefit many industries, for e-commerce it is indispensable. Tomas Montvilas, Chief Commercial Officer at Oxylabs, says that collecting external data in real time is becoming a necessity to remain competitive.
“E-commerce businesses can use external data to optimise assortment in digital shelves, enable real-time dynamic pricing, monitor search placements in marketplaces and in many more cases. Such processes help companies get critical strategic insights and stay ahead in the competition”, - he told the OxyCon audience.
When businesses rely on data, the question of data quality becomes critical. Unfortunately, according to the CTO at Web Data Works and Microsoft Regional Director Allen O’Neill, most data collection systems are not built with data quality in mind.
“Poor data quality can strongly affect the bottom line. Therefore, it must be measured from the point of data collection all the way to the point when it is delivered to the ultimate consumer of data”, - suggested Allen.
Another factor that could damage the bottom line is reputational risk, caused by unethical data collection. Con Conlon, the Managing Director of Merit Data & Technology, addressed this topic at the event, suggesting that companies must have ethical data collection policies - besides providing clear do’s and don’ts, it leads to better code and better maintenance outcomes.
“Web scraping can do much to underpin transparent and efficient markets, but we have a duty to undertake this activity in a manner which is thoughtful, considerate, and carries minimal impact on data sources”, - he said.
When it comes to the newest technical trends in web scraping, machine learning (ML) is behind most of them. Jurijus Gorskovas, Machine Learning Engineer at Oxylabs, presented practical ML applications in web scraping.
“Operations such as content classification, content extraction or CAPTCHA solving can become a lot more efficient with automation. Eventually, we have managed to empower our scraper with at least 5 ML solutions that are saving costs and development resources, and improves our service delivery success,” - Jurijus said.
OxyCon 2021 is an annual web scraping community gathering opportunity enabled by Oxylabs, the data gathering solutions provider. The event took place on 25-26 August and over 1 000 people registered to participate.