You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: sources/academy/webscraping/scraping_basics_javascript2/07_extracting_data.md
+33-33Lines changed: 33 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,14 +36,14 @@ It's because some products have variants with different prices. Later in the cou
36
36
Ideally we'd go and discuss the problem with those who are about to use the resulting data. For their purposes, is the fact that some prices are just minimum prices important? What would be the most useful representation of the range for them? Maybe they'd tell us that it's okay if we just remove the `From` prefix?
In other cases, they'd tell us the data must include the range. And in cases when we just don't know, the safest option is to include all the information we have and leave the decision on what's important to later stages. One approach could be having the exact and minimum prices as separate values. If we don't know the exact price, we leave it empty:
@@ -100,9 +100,9 @@ Often, the strings we extract from a web page start or end with some amount of w
100
100
We call the operation of removing whitespace _trimming_ or _stripping_, and it's so useful in many applications that programming languages and libraries include ready-made tools for it. Let's add JavaScript's built-in [.trim()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim):
101
101
102
102
```js
103
-
consttitleText=title.text().trim();
103
+
consttitle=$title.text().trim();
104
104
105
-
constpriceText=price.text().trim();
105
+
constpriceText=$price.text().trim();
106
106
```
107
107
108
108
## Removing dollar sign and commas
@@ -124,7 +124,7 @@ The demonstration above is inside the Node.js' [interactive REPL](https://nodejs
124
124
We need to remove the dollar sign and the decimal commas. For this type of cleaning, [regular expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions) are often the best tool for the job, but in this case [`.replace()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace) is also sufficient:
125
125
126
126
```js
127
-
constpriceText= price
127
+
constpriceText=$price
128
128
.text()
129
129
.trim()
130
130
.replace("$", "")
@@ -137,7 +137,7 @@ Now we should be able to add `parseFloat()`, so that we have the prices not as a
@@ -156,7 +156,7 @@ Great! Only if we didn't overlook an important pitfall called [floating-point er
156
156
These errors are small and usually don't matter, but sometimes they can add up and cause unpleasant discrepancies. That's why it's typically best to avoid floating point numbers when working with money. We won't store dollars, but cents:
Copy file name to clipboardExpand all lines: sources/academy/webscraping/scraping_basics_javascript2/08_saving_data.md
+28-26Lines changed: 28 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@ We should use widely popular formats that have well-defined solutions for all th
25
25
26
26
## Collecting data
27
27
28
-
Producing results line by line is an efficient approach to handling large datasets, but to simplify this lesson, we'll store all our data in one variable. This'll take three changes to our program:
28
+
Producing results line by line is an efficient approach to handling large datasets, but to simplify this lesson, we'll store all our data in one variable. This'll take four changes to our program:
Before looping over the products, we prepare an empty array. Then, instead of printing each line, we append the data of each product to the array in the form of a JavaScript object. At the end of the program, we print the entire array at once.
75
+
Instead of printing each line, we now return the data for each product as a JavaScript object. We've replaced `.each()` with [`.map()`](https://cheerio.js.org/docs/api/classes/Cheerio#map-3), which also iterates over the selection but, in addition, collects all the results and returns them as a Cheerio collection. We then convert it into a standard JavaScript array by calling [`.get()`](https://cheerio.js.org/docs/api/classes/Cheerio#call-signature-32). Near the end of the program, we print the entire array.
76
+
77
+
:::tip Advanced syntax
78
+
79
+
When returning the item object, we use [shorthand property syntax](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Object_initializer#property_definitions) to set the title, and [spread syntax](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax) to set the prices. It's the same as if we wrote the following:
80
+
81
+
```js
82
+
{
83
+
title: title,
84
+
minPrice:priceRange.minPrice,
85
+
price:priceRange.price,
86
+
}
87
+
```
88
+
89
+
:::
90
+
91
+
The program should now print the results as a single large JavaScript array:
76
92
77
93
```text
78
94
$ node index.js
@@ -91,20 +107,6 @@ $ node index.js
91
107
]
92
108
```
93
109
94
-
:::tip Spread syntax
95
-
96
-
The three dots in `{ title: titleText, ...priceRange }` are called [spread syntax](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax). It's the same as if we wrote the following:
97
-
98
-
```js
99
-
{
100
-
title: titleText,
101
-
minPrice:priceRange.minPrice,
102
-
price:priceRange.price,
103
-
}
104
-
```
105
-
106
-
:::
107
-
108
110
## Saving data as JSON
109
111
110
112
The JSON format is popular primarily among developers. We use it for storing data, configuration files, or as a way to transfer data between programs (e.g., APIs). Its origin stems from the syntax of JavaScript objects, but people now use it accross programming languages.
@@ -202,7 +204,7 @@ In this lesson, we created export files in two formats. The following challenges
202
204
203
205
### Process your JSON
204
206
205
-
Write a new Node.js program that reads `products.json`, finds all products with a min price greater than $500, and prints each of them.
207
+
Write a new Node.js program that reads the `products.json` file we created in the lesson, finds all products with a min price greater than $500, and prints each of them.
0 commit comments