31: Crawling the Web using Elixir with Oleg Tarasenko and Tze Yiing

Thinking Elixir Podcast - Un pódcast de ThinkingElixir.com - Martes

Categorías:

We talk with Oleg Tarasenko and Tze Yiing about crawling the web using Elixir. Oleg created the crawly project to help solve this problem and Tze Yiing joined him as a contributor and maintainer. We cover how Elixir is well suited to orchestrate crawling, how to deal with login pages, understanding the legal concerns, building a codeless scraper and much more! Show Notes online - http://podcast.thinkingelixir.com/31 Elixir Community News https://dashbit.co/blog/ten-years-ish-of-elixir – January 9th marked the 10th year since the first commit to the Elixir repository https://github.com/elixir-lang/elixir/commit/337c3f2d569a42ebd5fcab6fef18c5e012f9be5b – First commit on the repository https://twitter.com/josevalim/status/1349010127270129670 – Jose Valim reveals the name of his secret project is called 'Nx' https://remote.com/blog/welcoming-elixir-creator-jose-valim – Jose Valim joins Remote as a Technical Adivsor https://twitter.com/josevalim/status/1347858475267854336 – ExUnit will catch SIGQUIT message from CTRL+\ and shows the tests that were running https://github.com/elixir-lang/elixir/blob/master/lib/mix/lib/mix/tasks/test.ex#L34 – ExUnit will print how much time the test suite spent on async tests vs sync tests https://twitter.com/fhunleth/status/1348092050487570433 – Nerves support on the M1 is looking good https://www.youtube.com/playlist?list=PLqj39LCvnOWZl_Pb0Y7wGWijKbTvL4gJg – Elixir Conf 2020 videos have all been publicly released! Do you have some Elixir news to share? Tell us at @ThinkingElixir or email at [email protected] Discussion Resources https://oltarasenko.medium.com/web-scraping-with-elixir-and-crawly-extracting-data-behind-authentication-a52584e9cf13 https://oltarasenko.medium.com/using-elixir-and-crawly-for-price-monitoring-7364d345fc64 – Using Elixir for price monitoring https://hex.pm/packages/crawly https://github.com/oltarasenko/crawly https://www.erlang-solutions.com/blog/web-scraping-with-elixir.html – Oleg's older web scraping with Elixir article https://www.erlang-solutions.com/blog/how-to-build-a-machine-learning-project-in-elixir.html – Building a machine learning projects with Elixir, Tensorflow and Crawly https://oltarasenko.medium.com/what-is-web-scraping-and-why-you-might-want-to-use-it-a0e4b621f6d0 – What is web scraping, and why you might want to use it? https://www.pillowskin.com – Ziinc's project using scraping and aggregation https://www.tensorflow.org/ https://oltarasenko.medium.com/the-unofficial-guide-to-extracting-google-search-results-in-2021-with-elixir-7a6ef80d0f5b https://scrapy.org/ https://github.com/fredwu/crawler https://www.eff.org/deeplinks/2019/09/victory-ruling-hiq-v-linkedin-protects-scraping-public-data – EFF legal interpretation of LinkedIn vs HiQ scraping case https://github.com/scrapinghub/splash/ https://www.joinhoney.com/ https://hexdocs.pm/crawly/readme.html#quickstart – Crawly quickstart guid https://hexdocs.pm/crawly/tutorial.html – Crawley tutorial https://github.com/oltarasenko/crawly_ui – Crawly UI project http://crawlyui.com/ – Crawly UI project page Data is the new gold https://t.me/elixir_crawly – Crawley Telegram group Guest Information https://github.com/oltarasenko – Oleg on Github https://oltarasenko.medium.com/ – Oleg's Blog https://twitter.com/tzeyiing – Lee TzeYiing on Twitter https://github.com/Ziinc – Lee TzeYiing on Github https://www.tzeyiing.com – Lee TzeYiing Blog Find us online Message the show - @ThinkingElixir Email the show - [email protected] Mark Ericksen - @brainlid David Bernheisel - @bernheisel Cade Ward - @cadebward

Visit the podcast's native language site