Wednesday, April 27, 2022

Octoparse

Let’s focus on the Octoparse Web Scraping tool, which helps us quickly fetch data from any website without coding techniques and anyone can use this tool to build a crawler in just minutes as long as the data is visible on the web page. If you asked me in short words about this tool, I would say this...
Share:

Tuesday, April 26, 2022

Web-Scraping

This is the process of extracting the diverse volume of data (content) in the standard format from a website in slice and dice as part of data collection in Data Analytics and Data Science perspective in the form of flat files (.csv,.json etc.,) or stored into the database. The scraped data will usually...
Share:

Saturday, April 23, 2022

Reading the Data using Spark

 Lets continue from where we left in the previous post.df_spark = spark_session.read.csv('sample_data/california_housing_train.csv')In the above code, we can see that spark uses the Spark session variable to call the read.csv() function to read the data when it is in CSV format now if you remember...
Share:

Friday, April 22, 2022

PySpark using Python

Apache Spark is a sort of engine which helps in operating and executing the data analysis, data engineering, and machine learning tasks both in the cloud as well as on a local machine, and for that, it can either use a single machine or the clusters i.e distributed system.We already have some relevant...
Share:

Thursday, April 21, 2022

Qubits

The qubit is short for “quantum bit.” While a bit can only be 0 or 1, a qubit can exist in more states. Qubits are surprising, fascinating, and powerful. They follow strange rules which may not initially seem natural to you. According to physics, these rules may be how nature itself works at the level...
Share: