Menu
  • UX Trending
  • UX PODCASTS
    • DESIGN UNTANGLED
    • UX CAKE
  • UX Reading Room
  • UX Portfolio Building
  • UX JOBS
    • Atlanta
    • Dallas
    • Los Angeles
UXShareLab… everything you need to know about UX and more…
for the user experience design community

Search

Browse: Home   /   AI Tools Are Scraping Your Website. Is That a Good Thing?

AI Tools Are Scraping Your Website. Is That a Good Thing?

by Eric Karkovack on May 10, 2023 in 3d, 404, ab-testing, abandonment, about, accesibility, accessibility, accordion, accounts, actions, activation, activation, activity-centered-design, activity-feed, adaptive-content, adaptive-design, addresses, addresses, administration, artificial-intelligence, seo, speckyboy, ucd, UI, UI design, UI Kits, ui-element, under-construction, under-construction, undo, Universal design, Universal Design & Accessibility, untagged, updating, upload, upload, uploading, uploading-documents, uploading-documents, urls, Usability, usability test, usability-counts, usability-engineering, usability-integration, usability-study, usability-testing, use-cases, useful, user, User Experience, User Interaction, user interface, user needs, User Research, User testing, user testing, user-attention, user-behavior, user-behaviour, user-centered-design, user-communication, user-engagement, user-error, user-expectation, user-feedback, user-freedom, user-generated-content, user-habit, user-habit, user-interviews, user-journey, user-perception, user-preferences, user-profile, user-research, user-story, user-tracking, username, users, users-tests, usertesting.com, UX, UX course, UX dark patterns, UX Data, UX Deliverables, UX Design, UX education, UX Events in Atlanta, UX Events in LA, UX Forms, UX Jobs Atlanta, UX Jobs Dallas, UX Jobs in Atlanta, UX Jobs Los Angeles, UX Microinteractions, UX Research, UX Rockstars, UX Toolbox, UX Trends, UX Usability, UX writer, UX writing, ux-charts, ux-designer, ux-field, ux-gurus, ux-integration, ux-maturity, ux-methods, ux-writing, UX101, uxbooth, uxstackexchange, Valentine's Day, validation, value, vat, Vector Art, verb, verification, version-control, version-control, versioning, video, video games, Video Template, Video Tutorial, Video Tutorials, view, viewport, viewport-orientation, Vintage Design, virtual-reality, vision-impairment, visual, Visual Design, visual-cues, visual-design, visualization, voice, voting, vouchers, waiting, walkthrough, warnings, wayfinding, wcag, wearables, web, Web Design, Web Template, Web Template, web-app, Web-Based Apps, webforms, WebGL, webmaster, Website Speed, window, window-management, window-management, windows-8, windows-os, windows-phone-7, winforms, wireframe, Wireframe Design, Wireframe Tools, Wireframes, wireframing, wishlist, wizard, Word, WordCamp, wording, wordpress, WordPress 5.0, WordPress Gutenberg, WordPress Membership, WordPress Plugins, WordPress Security, WordPress Snippets, WordPress Speed, WordPress Theme, WordPress Themes, WordPress Tutorial, WordPress Tutorials, workflow, workshop, workspace, WP Plugins, wpf, wrapping, writing, writing, writing-direction, wysiwyg, zooming

The rise of artificial intelligence (AI) has been disruptive. Things are changing rapidly. And it seems like this technology is posing new moral, ethical, and existential questions each day.

There are plenty of stories and opinions to choose from. But one recent incident caught my eye.

A website owner claimed that their site was being “hammered” by a content scraping bot. The tool img2dataset, catalogs large volumes of images for use in AI tools like Stable Diffusion.

The site’s owner opened an issue on the tool’s GitHub repository. He was advised to opt out of scraping. To do so, he’d have to add specific headers to his website.

This is our new reality. These tools are grabbing all manner of content – copyrighted images included. They’re regurgitating it to their users. Indeed, it’s the world’s biggest mash-up.

What’s more, it’s up to website owners to specify that they don’t want to participate. Is this as outrageous as it sounds? Let’s examine the issue and what it means for website owners.

Scraping Website Content for Profit Isn’t New

On one level, a tool scraping your website isn’t a novel idea. Search engines have been both indexing content and displaying relevant bits in results for years. In addition, RSS has allowed for retrieving text and images since the early days of the web.

And companies like Google have profited massively from these efforts. The more data they collect, the better results they provide. Thus, the more eyeballs they attract. That results in bigger ad revenue.

It’s been the way of the world for a few decades now. Therefore, it’s no surprise that other companies are taking a similar approach.

After all, an AI developer needs a good source of content to “train” its tool. What better way to do so than by collecting as much data as possible? For them, the web is the gift that keeps on giving.

So, the mere fact that a bot is visiting your website and cataloging content isn’t a big deal. But maybe that’s where the similarities end.

Search engine bots have been indexing and scraping data for years.

Is There Any Benefit for Website Owners?

The big difference is in who benefits. When a search engine indexes your website, you stand to gain something. Better rankings mean more visitors – and potentially more customers. And if you practice search engine optimization (SEO), you’re asking Google to visit.

AI bots may not rise to the level of an uninvited guest. But they’re not exactly visiting to your benefit, either.

For example, when you ask ChatGPT to write code, it’s not thinking back to the computer science course it took in college. The tool is tapping into previously-scraped content. True, it may not be a line-for-line copy (although sometimes it is). But the language model is using what it has “learned” to produce an answer.

Similarly, generating an image of Elon Musk riding a unicorn isn’t magic (sorry to spoil the fun). The various visual components had to come from somewhere. Original (and potentially copyrighted) images are key ingredients.

In both scenarios, the beneficiaries are the AI tool and the end user. The sources used to generate this content? They have more bot traffic added to their monthly bandwidth usage.

The developer of img2dataset has a slightly different take. Among their responses to concerns about requiring an opt-out:

“You will have many opportunities in the years to come to benefit from AI. I hope you see that sooner rather than later. As creators you have even more opportunities to benefit from it.”

Their logic seems to suggest that we’ll all benefit from AI at some point. So, allowing the tool to scrape your content is good for humanity. Or something like that.

The separation between Google's search engine and Bard AI tool is unclear.

To Block or Not to Block?

The decision of whether to block AI from scraping your website is complex. Or it requires multiple stages, at least.

Perhaps the easiest part is identifying your philosophy. Are you OK with your content being scraped? If so, carry on. If not, the other parts of the equation are more complicated.

For one, there’s no universal way to opt out of all AI scraping. The headers for blocking image2dataset work only for that tool. That means keeping track of popular tools and finding methods for blocking them.

And companies like Google and Microsoft are further complicating the conversation. Both own search engines. You likely want them to index your website. But they also have AI tools. Where is the line drawn between these different products?

For its part, Google’s Bard claims that it doesn’t scrape content from websites (I asked!). But in the same conversation, it also says that websites are a part of where it gets data. Make what you will of those answers.

If you’d like to block all manner of AI tools, it won’t be easy. But maybe not for long. I can envision services that will cater to website owners who want nothing to do with content scraping. They may allow us to do so more efficiently.

But until such time, this seems like a losing battle. AI is inevitable. And who has time to catalog every new app that hits the market? Plus, it may be difficult to block these tools without also negatively impacting SEO.

Blocking AI tools from scraping your website may require constant vigilance.

Website Owners Must Fend For Themselves

Not everyone will be as impacted as the frustrated user in our introduction. In that case, it appears that image2dataset was indexing a large volume of images. Unless you’re in the same boat, your site probably won’t experience any problems.

But the issue goes much deeper. It should make us think about how we value our content. And we should question what sort of rights (if any) these tools have. Can they simply take what they want? Or should there be guidelines outlining what is and isn’t permissible?

Meaningful regulation of the industry could be months or even years away. In the interim, website owners are left to fend for themselves.

As part of the effort, it’s important to make your voice heard. Encourage companies to make opting out of scraping a transparent process. Express your concerns to elected officials and others of influence.

It may not slow down the onslaught of AI tools. But it could prevent things from getting too far out of hand. That will benefit us all.

The post AI Tools Are Scraping Your Website. Is That a Good Thing? appeared first on Speckyboy Design Magazine.

Tags: an event apart, smashing, speckyboy, UX Interview, ux stack, UX Stack Exchange, UXBooth

Sign Up for the latest in UX News...

Enter your email address to subscribe to UXShareLab and receive notifications of new posts by email.

Join 834 other subscribers

Related Posts

How to create a hierarchy within filters in filter bar
How to create a hierarchy within filters in filter bar
Building members login to site [on hold]
Building members login to site [on hold]
What would a solution in surfacing good chat messages from Video Live Chats look like?
What would a solution in surfacing good chat messages from Video Live Chats look like?
Help with Design Team Dynamic Problems
Help with Design Team Dynamic Problems

AI Tools Are Scraping Your Website. Is That a Good Thing?

by Eric Karkovack on May 10, 2023 in 3d, 404, ab-testing, abandonment, about, accesibility, accessibility, accordion, accounts, actions, activation, activation, activity-centered-design, activity-feed, adaptive-content, adaptive-design, addresses, addresses, administration, artificial-intelligence, seo, speckyboy, ucd, UI, UI design, UI Kits, ui-element, under-construction, under-construction, undo, Universal design, Universal Design & Accessibility, untagged, updating, upload, upload, uploading, uploading-documents, uploading-documents, urls, Usability, usability test, usability-counts, usability-engineering, usability-integration, usability-study, usability-testing, use-cases, useful, user, User Experience, User Interaction, user interface, user needs, User Research, user testing, User testing, user-attention, user-behavior, user-behaviour, user-centered-design, user-communication, user-engagement, user-error, user-expectation, user-feedback, user-freedom, user-generated-content, user-habit, user-habit, user-interviews, user-journey, user-perception, user-preferences, user-profile, user-research, user-story, user-tracking, username, users, users-tests, usertesting.com, UX, UX course, UX dark patterns, UX Data, UX Deliverables, UX Design, UX education, UX Events in Atlanta, UX Events in LA, UX Forms, UX Jobs Atlanta, UX Jobs Dallas, UX Jobs in Atlanta, UX Jobs Los Angeles, UX Microinteractions, UX Research, UX Rockstars, UX Toolbox, UX Trends, UX Usability, UX writer, UX writing, ux-charts, ux-designer, ux-field, ux-gurus, ux-integration, ux-maturity, ux-methods, ux-writing, UX101, uxbooth, uxstackexchange, Valentine's Day, validation, value, vat, Vector Art, verb, verification, version-control, version-control, versioning, video, video games, Video Template, Video Tutorial, Video Tutorials, view, viewport, viewport-orientation, Vintage Design, virtual-reality, vision-impairment, visual, Visual Design, visual-cues, visual-design, visualization, voice, voting, vouchers, waiting, walkthrough, warnings, wayfinding, wcag, wearables, web, Web Design, Web Template, Web Template, web-app, Web-Based Apps, webforms, WebGL, webmaster, Website Speed, window, window-management, window-management, windows-8, windows-os, windows-phone-7, winforms, wireframe, Wireframe Design, Wireframe Tools, Wireframes, wireframing, wishlist, wizard, Word, WordCamp, wording, wordpress, WordPress 5.0, WordPress Gutenberg, WordPress Membership, WordPress Plugins, WordPress Security, WordPress Snippets, WordPress Speed, WordPress Theme, WordPress Themes, WordPress Tutorial, WordPress Tutorials, workflow, workshop, workspace, WP Plugins, wpf, wrapping, writing, writing, writing-direction, wysiwyg, zooming

The rise of artificial intelligence (AI) has been disruptive. Things are changing rapidly. And it seems like this technology is posing new moral, ethical, and existential questions each day.

There are plenty of stories and opinions to choose from. But one recent incident caught my eye.

A website owner claimed that their site was being “hammered” by a content scraping bot. The tool img2dataset, catalogs large volumes of images for use in AI tools like Stable Diffusion.

The site’s owner opened an issue on the tool’s GitHub repository. He was advised to opt out of scraping. To do so, he’d have to add specific headers to his website.

This is our new reality. These tools are grabbing all manner of content – copyrighted images included. They’re regurgitating it to their users. Indeed, it’s the world’s biggest mash-up.

What’s more, it’s up to website owners to specify that they don’t want to participate. Is this as outrageous as it sounds? Let’s examine the issue and what it means for website owners.

Scraping Website Content for Profit Isn’t New

On one level, a tool scraping your website isn’t a novel idea. Search engines have been both indexing content and displaying relevant bits in results for years. In addition, RSS has allowed for retrieving text and images since the early days of the web.

And companies like Google have profited massively from these efforts. The more data they collect, the better results they provide. Thus, the more eyeballs they attract. That results in bigger ad revenue.

It’s been the way of the world for a few decades now. Therefore, it’s no surprise that other companies are taking a similar approach.

After all, an AI developer needs a good source of content to “train” its tool. What better way to do so than by collecting as much data as possible? For them, the web is the gift that keeps on giving.

So, the mere fact that a bot is visiting your website and cataloging content isn’t a big deal. But maybe that’s where the similarities end.

Search engine bots have been indexing and scraping data for years.

Is There Any Benefit for Website Owners?

The big difference is in who benefits. When a search engine indexes your website, you stand to gain something. Better rankings mean more visitors – and potentially more customers. And if you practice search engine optimization (SEO), you’re asking Google to visit.

AI bots may not rise to the level of an uninvited guest. But they’re not exactly visiting to your benefit, either.

For example, when you ask ChatGPT to write code, it’s not thinking back to the computer science course it took in college. The tool is tapping into previously-scraped content. True, it may not be a line-for-line copy (although sometimes it is). But the language model is using what it has “learned” to produce an answer.

Similarly, generating an image of Elon Musk riding a unicorn isn’t magic (sorry to spoil the fun). The various visual components had to come from somewhere. Original (and potentially copyrighted) images are key ingredients.

In both scenarios, the beneficiaries are the AI tool and the end user. The sources used to generate this content? They have more bot traffic added to their monthly bandwidth usage.

The developer of img2dataset has a slightly different take. Among their responses to concerns about requiring an opt-out:

“You will have many opportunities in the years to come to benefit from AI. I hope you see that sooner rather than later. As creators you have even more opportunities to benefit from it.”

Their logic seems to suggest that we’ll all benefit from AI at some point. So, allowing the tool to scrape your content is good for humanity. Or something like that.

The separation between Google's search engine and Bard AI tool is unclear.

To Block or Not to Block?

The decision of whether to block AI from scraping your website is complex. Or it requires multiple stages, at least.

Perhaps the easiest part is identifying your philosophy. Are you OK with your content being scraped? If so, carry on. If not, the other parts of the equation are more complicated.

For one, there’s no universal way to opt out of all AI scraping. The headers for blocking image2dataset work only for that tool. That means keeping track of popular tools and finding methods for blocking them.

And companies like Google and Microsoft are further complicating the conversation. Both own search engines. You likely want them to index your website. But they also have AI tools. Where is the line drawn between these different products?

For its part, Google’s Bard claims that it doesn’t scrape content from websites (I asked!). But in the same conversation, it also says that websites are a part of where it gets data. Make what you will of those answers.

If you’d like to block all manner of AI tools, it won’t be easy. But maybe not for long. I can envision services that will cater to website owners who want nothing to do with content scraping. They may allow us to do so more efficiently.

But until such time, this seems like a losing battle. AI is inevitable. And who has time to catalog every new app that hits the market? Plus, it may be difficult to block these tools without also negatively impacting SEO.

Blocking AI tools from scraping your website may require constant vigilance.

Website Owners Must Fend For Themselves

Not everyone will be as impacted as the frustrated user in our introduction. In that case, it appears that image2dataset was indexing a large volume of images. Unless you’re in the same boat, your site probably won’t experience any problems.

But the issue goes much deeper. It should make us think about how we value our content. And we should question what sort of rights (if any) these tools have. Can they simply take what they want? Or should there be guidelines outlining what is and isn’t permissible?

Meaningful regulation of the industry could be months or even years away. In the interim, website owners are left to fend for themselves.

As part of the effort, it’s important to make your voice heard. Encourage companies to make opting out of scraping a transparent process. Express your concerns to elected officials and others of influence.

It may not slow down the onslaught of AI tools. But it could prevent things from getting too far out of hand. That will benefit us all.

The post AI Tools Are Scraping Your Website. Is That a Good Thing? appeared first on Speckyboy Design Magazine.

Tags: an event apart, smashing, speckyboy, UX Interview, ux stack, UX Stack Exchange, UXBooth

Related Posts

Hamburger menu on desktop and "bottom navigation" on mobile
Hamburger menu on desktop and "bottom navigation" on mobile
Are there guidelines or best practices for notifications or prompts for how to use your app?
Are there guidelines or best practices for notifications or prompts for how to use your app?
How do you measure Retention in the HEART metrics with B2B enterprise software?
How do you measure Retention in the HEART metrics with B2B enterprise software?
What are components more important in a Login interface - System?
What are components more important in a Login interface – System?
← 8 CSS & JavaScript Snippets for Creating Beautiful Bokeh Effects
Weekly News for Designers № 694 →

Topics

UX Jobs Atlanta User testing UX Jobs Dallas UI User Experience Usability UX Jobs Los Angeles UI design WordPress Plugins user-behavior UX Design Web Design User Research UX Jobs in Atlanta uxbooth UX UX Rockstars uxstackexchange Universal Design & Accessibility UX Toolbox WordPress Gutenberg User Interaction web-app wordpress Visual Design

Feeds

UI UI design Universal Design & Accessibility Usability user-behavior User Experience User Interaction User Research User testing UX uxbooth UX Design UX Jobs Atlanta UX Jobs Dallas UX Jobs in Atlanta UX Jobs Los Angeles UX Rockstars uxstackexchange UX Toolbox Visual Design web-app Web Design wordpress WordPress Gutenberg WordPress Plugins

<span>recent posts</span>

  • UX in 2018: The human element

    • Anywhere
  • Three Takeaways from the Hawai’i Missile False Alarm

    • Anywhere
  • UX in 2018: Content

    • Anywhere
  • UX in 2018: Design, Development, and Accessibility

    • Anywhere
  • The Power and Danger of Persuasive Design

    • Anywhere

connect to uxsharelab

Enter your email address to subscribe to receive notifications of new posts by email.

UXShareLab. Copyright © 2018. All rights reserved.

  • Contact UXShareLab
  • UXShareLab Community
  • UX PROCESS
  • Recommended Reading
  • UX StackExchange