Commit 3d7904ea authored by tssasha's avatar tssasha
Browse files

edit data

parent 6e1d6073
......@@ -2,11 +2,15 @@
Автоматические предложения вариантов запроса при поиске.
Данные: https://at.ispras.ru/owncloud/index.php/s/ilMufYYEqA8bUvT
Данные: https://at.ispras.ru/owncloud/index.php/s/xkRJfXHOQQs4cpH
## Инструкция к run
* Собрать `./run build`
* Скачать yelp (достаточно выполнить один раз) `./run get-yelp`
* Залить данные `./run reindex-yelp`
* Запустить Solr `./run all`
\ No newline at end of file
* Запустить Solr `./run all`
## Инструкция к тестированию
`test/words.py` -> `test/request.py` -> `test/results.csv`
\ No newline at end of file
#!/bin/bash
YELP_DATA=yelp.tgz
YELP_DATA=middle_final.tar.gz
case "$1" in
"get-yelp")
mkdir -p data
wget https://at.ispras.ru/owncloud/index.php/s/ilMufYYEqA8bUvT/download -O data/$YELP_DATA
wget https://at.ispras.ru/owncloud/index.php/s/xkRJfXHOQQs4cpH/download -O data/$YELP_DATA
;;
"reindex-yelp")
#if [ -f data/$YELP_DATA ]; then
# stop container
if [ -f data/$YELP_DATA ]; then
# stop container
$0 down
# cleanup
sudo rm -rf var-solr
......@@ -18,12 +18,12 @@ case "$1" in
# start
$0 all
# index all data (decompress on the fly and pass to index)
#tar xzf data/$YELP_DATA final.json -O | curl 'http://localhost:8983/solr/yelp/update?commit=true' --data-binary @- -H 'Content-type:application/json'
sleep 10
cat data/middle_final.json | curl 'http://localhost:8983/solr/yelp/update?commit=true' --data-binary @- -H 'Content-type:application/json'
# else
# echo "Please use ./run get-yelp to obtain yelp data"
# fi
tar xzf data/$YELP_DATA middle_final.json -O | curl 'http://localhost:8983/solr/yelp/update?commit=true' --data-binary @- -H 'Content-type:application/json'
# cat data/middle_final.json | curl 'http://localhost:8983/solr/yelp/update?commit=true' --data-binary @- -H 'Content-type:application/json'
else
echo "Please use ./run get-yelp to obtain yelp data"
fi
;;
"build")
sudo docker-compose build
......
import json
import random
import gzip
import tarfile
with open('../data/middle_final.json') as json_file, open('words.txt', "w") as words_file, open('cities.txt', "w") as cities_file:
data = json.load(json_file)
with tarfile.open('../data/middle_final.tar.gz') as tar, open('words.txt', "w") as words_file, open('cities.txt', "w") as cities_file:
f = tar.extractfile("middle_final.json")
data = json.loads(f.read())
for item in data[:500]:
words = list(map(str, item['name'].split()))
word = random.choice(words)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment